Test version using different formulation of context rules - with hopefully same effect, but (somewhat) simpler.
This file contains Label Generation Rules (LGR) for the Malayalam script as would be appropriate for the Root zone. For more details on this proposal see "Proposal for a Malayalam Script Root Zone Label Generation Ruleset (LGR) [Proposal]". The format of this file follows [RFC 7940].
Malayalam was first written with the Vatteluttu alphabet (വട്ടെഴുത്ത് Vaṭṭeḻuttŭ), which means 'round writing' and developed from the Brahmi script. The oldest known written text in Malayalam is known as the Vazhappalli or Vazhappally inscription, is in the Vatteluttu alphabet and dates from about 830 AD. More details in Section "3 Background on Script and Principal Languages Using It" of the [Proposal].
The basic characters in Malayalam are classified into seven main categories. They are Consonants, Vowels, Matra, Halant, Visargam, Anusvaram and Chillu letters.
Consonant: Malayalam is written in an abugida script derived ultimately from Brāhmī in which every consonant carries an inherent a. More details in Section 3.8, "The Structure of Malayalam Script" of the [Proposal].
Matra: Vowels other than the inherent vowel are written as vowel diacritics. They are referred to as Matras, when they follow consonants. More details in Section 3.8, "The Structure of Malayalam Script" of the [Proposal].
Halant: A consonant can be combined with another consonant or conjunct using the halant encoded as U+0D4D MALAYALAM SIGN VIRAMA. This strips off the implicit vowel. More details in Section 3.8, "The Structure of Malayalam Script" of the [Proposal].
Anusvaram: In Malayalam, anusvara represented as ം (0D02), simply represents a consonant /m/ after a vowel, though this /m/ may be assimilated to another nasal consonant. More details in Section 3.8 "The Structure of Malayalam Script" of the [Proposal].
Visargam: /വിസർഗം,/ (visargam), or visarga, represents a consonant /h/ after a vowel, and is transliterated as ḥ. Like the anusvara, it is a special symbol, and is never followed by an inherent vowel or another vowel. More details in Section 3.8, "The Structure of Malayalam Script" of the [Proposal].
Chillu: Chillu letters, aka "Chillaksharam", represent pure consonants without any vowel sound. More details in Section 3.8, "The Structure of Malayalam Script" of the [Proposal].
Reordrant: Vowel diacritics, part of which reorder around the preceding character or conjunct. More details in Sections 6.1 "In-script Variants" and 7.1.1 "Variables or definitions" of the [Proposal].
According to Section 5, "Repertoire" in [Proposal], the Malayalam LGR contains 70 unique code points.
The repertoire is based on [MSR-4], which is a subset of Unicode 6.3 [Unicode 6.3].
Each code-point has associated Glyph, Character Name, Indic Syllabic Category and References.
According to Section 6 "Variants", in "[Proposal]", this LGR defines one in-script variant due to the multiple ways to write the conjunct “nta" in Malayalam. This LGR also defines the cross-script variants which are "Confusing due to deviation from normally perceived character formations by larger linguistic community". These cases are not of mere visual similarity. These can cause confusion even to a careful observer and hence being proposed as variants.
Variant Disposition: All variants are of type “blocked”, making labels that differ only by these variants mutually exclusive: whichever label containing either of these variants is chosen earlier, the other one equivalent variant label should be blocked. There is no preference among these variants.
Context Rules for Variants: some of the variants defined in this LGR are "effective null variants", that is, some code points in the source map to "nothing" in the target with all other code points unchanged. (Because mappings are symmetric, it does not matter whether it is the forward or reverse mapping that maps to "null"). Such variants require a context rule to keep the variant set well-behaved. Symmetry requires the same context rule for both forward and reverse mappings.
In other cases, the sequences or code points making up source and target are constrained by context rules on the code points. In such a case, any variants require context rules that match the intersection between the contexts for both source and target; otherwise a sequence might be considered valid in some variant label when it would not be valid in an equivalent context in an original label.
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟.
These rules have been formulated so that they can be adopted for LGR specification.
Following symbols are used in the WLE rules:
C → Consonant
M → Matra
V → Vowel
B → Anusvara
X → Visarga
D → Chandrabindu
H → Halant
L → Chillu
R → Reordrant Matra
Note: the Reordrant Matras include one sequence. That requires an auxiliary rule R in addition to class R.
The rules are:
The following context rules apply to code points U+0D33 and U+0D31 as well as to sequences ending in these code points:
The following context rules apply to variants:
More details in Section 6.1 "In-script Variants" and Section 7, "Whole Label Evaluation Rules (WLE)" of the [Proposal]
Note: the implementation of Rules 7 & 8 relies on the fact that a context rule is not evaluated between code points in the same sequence. For example, if a label contains two adjacent U+0D33 U+0D33 surrounded by other code points , the two code points can only be interpreted as the sequence U+0D33 U+0D33 ളള because a singleton U+0D33 ള is not allowed to be followed by another U+0D33 ള.
Under the Neo-Brahmi Generation Panel, there are many different scripts belonging to separate Unicode blocks. Each of these scripts will be assigned a separate LGR; however Neo-Brahmi GP will ensure that the fundamental philosophy behind building those LGRs are all in sync with all other Brahmi derived scripts. This is the Malayalam LGR, which caters to Malayalam language written using the Malayalam script.
Following references are cited in this document: