This file contains Label Generation Rules (LGR) for the Myanmar script for the Root zone. For more details on this LGR and additional background on the script, see "Proposal for a Myanmar Script Root Zone Label Generation Rule-Set (LGR)"; [Proposal-Myanmar]. This file is one of a set of LGR files that together form an integrated LGR for the DNS Root Zone [RZ-LGR-5]. The format of this file follows [RFC 7940].
The Root Zone LGR for the Myanmar script lists 163 entries in total; consisting of 98 Myanmar script unique code points and 65 sequences. The code point U+1063 Myanmar Tone Mark Sgaw Karen Hathi is not listed separately, but is available as part of a defined sequence, bringing the total to 99 distinct code points.
The repertoire includes code points used by languages written in Myanmar script that fall within levels 1 to 4 on the [EGIDS] scale, as well as EGIDS 5 languages which have more than 500,000 users are included in the analysis. They are Burmese, Shan, Rakhine, S'gaw Karen, Mon, Pa'O Karen. (See also [Ethnologue].) A non-exhaustive list of languages using each code point can be found in the comments. For more details, see Section 5 "Repertoire" in [Proposal-Myanmar].
Note: In this proposal, to avoid confusion, the term 'Myanmar' is used for the Myanmar script and the term 'Burmese' is used for the Myanmar language.
The repertoire is based on [MSR-5], which is a subset of [Unicode 11.0].
Code points outside the Myanmar script that are listed in this file are targets for out-of-script variants and are identified by a reflexive (identity) variant of type "out-of-repertoire-var". They do not form part of the repertoire.
As part of the Root Zone, this LGR includes neither digits nor the HYPHEN-MINUS.
Each code point or range is tagged with the script or scripts that the code point is used with, and one or more references documenting sufficient justification for inclusion in the repertoire; see "References" below. For code points that are part of the repertoire, comments identify the languages using the code point.
According to Section 6, "Variants", in [Proposal-Myanmar], this LGR defines following variants:
In-script variants: Variants are defined based on identical appearance or because they are language-based or otherwise analogues of the same code point or sequence (semantic variants).
The disposition for some variants is "blocked": only a single version of the label should be in the root-zone at one time, while all other variants are excluded. The disposition for the remaining in-script variants is "allocatable", allowing more than one variant to be delegated to the same entity. See Section 6.1 in the [Proposal-Myanmar].
Variants that have the same meaning, pronunciation and property should not appear both in the same string. Based on the language, only one variant should be chosen and written consistently. The LGR contains rules and other constraints on variants that prohibit mixed-language labels and arbitrary mixture of variants. See Section 7 in the [Proposal-Myanmar].
Some additional code point combinations could create visual variants; however the WLE rules disallow these combinations. They are therefore not normatively defined as variants, but listed in Appendix A of [Proposal-Myanmar].
Cross-script variants:; Some Myanmar characters look the same as characters in Malayalam, Oriya and Georgian scripts. See Section 6.2 in the [Proposal-Myanmar].
This LGR inherits additional cross-script variants by integration; they may not be listed here unless they result in in-script variants. See the merged, Common LGR [RZ-LGR-5] for details of all applicable cross-script variants, including any not listed here; always use the Common LGR for determining cross-script collisions of labels.
The specification of variants in the Root Zone LGR follows the guidelines in [RFC 8228].
The syllable principle is at the heart of the Myanmar script writing system. The general order of elements in a syllable is Consonant > (Medials) > Vowels > (Consonant) > Tone. Overall, the Myanmar script is composed of:
Consonants: Consonants usually stay at the head of each syllable. A consonant can be stand-alone or be followed by Medials, Dependent Vowels, other Signs or Tone Mark. A Consonant cannot be between Viramas to prevent the invalid case of c+v+c+v+c. See Section 3.3.1, "The Consonants" of the [Proposal-Myanmar].
Independent Vowels: Only Burmese, Mon and Pa’O languages use Independent Vowels. An Independent Vowel can be at any position in a label. See Section 3.3.2, "The Independent Vowels" of the [Proposal-Myanmar].
Dependent Vowels: Dependent vowel signs add vowel property to consonants. These signs appear in top/below/left/right positions of a center consonant or consonant+medial. The dotted-circle indicates where the centre character would be. Dependent vowel signs cannot be repeated and Dependent vowels cannot be adjacent to each other unless within sequences defined in Table 8-A. Dependent vowels also cannot be followed by Asat (U+1038) unless within sequences defined in Table 8-A. See Section 3.3.3, "The Diacritic - Dependent Vowels" of the [Proposal-Myanmar].
Medials: Medials are used to enhance the sound of Consonants. They are also noted as Dependent Consonants as they need a leading Consonant to attach to. All five languages in this proposal use Medials. See Section 3.3.4, "Diacritic - Medials" of the [Proposal-Myanmar].
Virama: U+1039 MYANMAR SIGN VIRAMA is used in Burmese and Mon. Virama has two properties, as killer (devoweliser) and joiner of syllable chaining. This virama brings the consonant after it to be rendered below the consonant before. Pattern of syllable chaining : Consonant + Virama + Consonant. However. a repetition of Consonant + Virama + Consonant is not allowed to prevent rendering issues. See Section 3.3.5, "Diacritic - Tone Marks and Other Signs" of the [Proposal-Myanmar].
Killer or Asat: U+103A MYANMAR SIGN ASAT is used in Burmese and Mon. This sign is used to remove the consonant sound of a letter and take only the vowel property to create more vowel sounds out of consonants. Except for defined sequences, Asat cannot follow other Diacritics. See Section 3.3.5, "Diacritic - Tone Marks and Other Signs" of the [Proposal-Myanmar].
Long Tone (t_long): U+1038 MYANMAR SIGN VISARGA appears at the end of the syllable. It creates a vowel sound with the higher tone. It follows a Consonant, Medial, Long vowel or the sequence U+102D U+102F. See Section 3.3.5.1, "Burmese Tone Marks and Other Signs" of the [Proposal-Myanmar].
Short Tone (t_short): MYANMAR SIGN DOT BELOW (U+1037) appears at the end of the syllable. It creates a vowel sound with the higher tone. It follows a Consonant, Medial, Long vowel or the sequence U+102D U+102F. See Section 3.3.5.1, "Burmese Tone Marks and Other Signs" of the [Proposal-Myanmar].
Other classes defined for use in WLE and context rules
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-5]. They are marked with ⍟. The default prohibition on leading combining marks is equivalent to ensuring that a label only starts with a consonant or vowel.
There are constraints on the context for many of the character classes in Myanmar. These constraints enforce the syllable structure to the degree needed for stability of rendering (which affects both security and usability) without enforcing other linguistic constraints or spellings. These constraints are implemented via a set of context and whole label rules formulated for LGR specification, as described in Section 7, "Whole Label Evaluation (WLE) Rules" in [Proposal-Myanmar].
The following shorthand names for sequences are used in the description or definition of these rules:
The rules are:
These rules are implemented as required or prohibited contexts for the respective repertoire elements.
According to Table 10 of [Proposal-Myanmar], there are code points which cannot occur both in the same label. The following WLE rules enforce these constraints:
These rules are implemented as WLE rules which trigger a corresponding action.
The following prohibited context applies to certain variants:
The no-mix rules trigger Myanmar-specific actions to invalidate any original and variant labels not satisfying the constraints. See Section 7, "Whole Label Evaluation (WLE) Rules" in [Proposal-Myanmar].
In addition, to limit the number of allocatable variant labels, further constraints on variants are included. These allow allocatable variant labels to either contain members of Grapheme-set-1 (left column in Table 10) or Grapheme-set-2 (right column in Table 10) but not a mixture of members from both sets. The code points on the left column are either the simpler forms (shorter sequence) or the lower Unicode code point values. The mixture for code points from different sets, however, is possible in the original, applied-for label. See Section 6.1 "In-script Variants" in [Proposal-Myanmar].
Each code point or sequence in grapheme-set-1 has been given the reflexive variant type "r-set1" and each code point or sequence in grapheme-set-2 has been given the reflexive variant of type "r-set2". (By convention, the prefix “r-“ marks a type used in a reflexive variant mapping, that is, it represents an instance of the original code point at that location in a variant label, see Section 5.3.4 in [RFC 7940].)
A variant mapping from a member of grapheme-set-1 to a member of grapheme-set-2 is of type "set1-to- set2", while the variant type for mapping from grapheme-set-2 to grapheme-set-1 is of type "set2-to-set1".
Script-specific actions evaluate these variant types to ensure the following constraints:
See Section 6.1 "In-script Variants" and Section 7, "Whole Label Evaluation (WLE) Rules" in [Proposal-Myanmar].
The Root Zone LGR for the Myanmar script was developed by the Myanmar Generation Panel. For additional detail on methodology and contributors see Sections 4 and 8 in [Proposal-Myanmar], as well as [RZ-LGR-5-Overview].
For more details for references [101] and up refer to the Table of References below.
]]>