5 2022-03-17 und-Mymr .

Proposal for a Root Zone Label Generation Rules for the Myanmar script

Overview

This file contains Label Generation Rules (LGR) for the Myanmar script for the Root zone. For more details on this LGR and additional background on the script, see "Proposal for a Myanmar Script Root Zone Label Generation Rule-Set (LGR)"; [Proposal-Myanmar]. This file is one of a set of LGR files that together form an integrated LGR for the DNS Root Zone [RZ-LGR-5]. The format of this file follows [RFC 7940].

Repertoire

The Root Zone LGR for the Myanmar script lists 163 entries in total; consisting of 98 Myanmar script unique code points and 65 sequences. The code point U+1063 Myanmar Tone Mark Sgaw Karen Hathi is not listed separately, but is available as part of a defined sequence, bringing the total to 99 distinct code points.

The repertoire includes code points used by languages written in Myanmar script that fall within levels 1 to 4 on the [EGIDS] scale, as well as EGIDS 5 languages which have more than 500,000 users are included in the analysis. They are Burmese, Shan, Rakhine, S'gaw Karen, Mon, Pa'O Karen. (See also [Ethnologue].) A non-exhaustive list of languages using each code point can be found in the comments. For more details, see Section 5 "Repertoire" in [Proposal-Myanmar].

Note: In this proposal, to avoid confusion, the term 'Myanmar' is used for the Myanmar script and the term 'Burmese' is used for the Myanmar language.

The repertoire is based on [MSR-5], which is a subset of [Unicode 11.0].

Code points outside the Myanmar script that are listed in this file are targets for out-of-script variants and are identified by a reflexive (identity) variant of type "out-of-repertoire-var". They do not form part of the repertoire.

As part of the Root Zone, this LGR includes neither digits nor the HYPHEN-MINUS.

Each code point or range is tagged with the script or scripts that the code point is used with, and one or more references documenting sufficient justification for inclusion in the repertoire; see "References" below. For code points that are part of the repertoire, comments identify the languages using the code point.

Variants

According to Section 6, "Variants", in [Proposal-Myanmar], this LGR defines following variants:

In-script variants: Variants are defined based on identical appearance or because they are language-based or otherwise analogues of the same code point or sequence (semantic variants).

The disposition for some variants is "blocked": only a single version of the label should be in the root-zone at one time, while all other variants are excluded. The disposition for the remaining in-script variants is "allocatable", allowing more than one variant to be delegated to the same entity. See Section 6.1 in the [Proposal-Myanmar].

Variants that have the same meaning, pronunciation and property should not appear both in the same string. Based on the language, only one variant should be chosen and written consistently. The LGR contains rules and other constraints on variants that prohibit mixed-language labels and arbitrary mixture of variants. See Section 7 in the [Proposal-Myanmar].

Some additional code point combinations could create visual variants; however the WLE rules disallow these combinations. They are therefore not normatively defined as variants, but listed in Appendix A of [Proposal-Myanmar].

Cross-script variants:; Some Myanmar characters look the same as characters in Malayalam, Oriya and Georgian scripts. See Section 6.2 in the [Proposal-Myanmar].

This LGR inherits additional cross-script variants by integration; they may not be listed here unless they result in in-script variants. See the merged, Common LGR [RZ-LGR-5] for details of all applicable cross-script variants, including any not listed here; always use the Common LGR for determining cross-script collisions of labels.

The specification of variants in the Root Zone LGR follows the guidelines in [RFC 8228].

Character Classes

The syllable principle is at the heart of the Myanmar script writing system. The general order of elements in a syllable is Consonant > (Medials) > Vowels > (Consonant) > Tone. Overall, the Myanmar script is composed of:

Consonants (c). The following sub-categories of c are also defined:
- c1, U+103F (GREAT SA, theoretical combination of two Myanmar Letter Sa)
- c2, a set of consonants to be combined with S16
- c3, a set of consonants to be combined with S17
Independent Vowels (iv)
Dependent Vowels (dv). Some of the dv also categorized to following sub-categories:
- Long Vowel (lv): U+102B, U+102C, U+102E, U+1030, U+1031, U+1032, and U+1036 Anusvara
- Short Vowel (sv): U+102D, U+102F"
- Anusvara (a): U+1036
- Shan Vowel (sh_vowel)
- Long Tone (t_short)
- Short Vowel (t_long)
Killer or Asat (k)
Virama (virama)
Medials (m)
- Mon medials (M_mon)
- Shan medial (M_shan)
Tone mark and Signs:
- Shan Tone (sh_tone)
- Pao Tone (pao_tone)
- Sgaw Tone (skaw_tone)
Other Various Signs (ov)

Consonants: Consonants usually stay at the head of each syllable. A consonant can be stand-alone or be followed by Medials, Dependent Vowels, other Signs or Tone Mark. A Consonant cannot be between Viramas to prevent the invalid case of c+v+c+v+c. See Section 3.3.1, "The Consonants" of the [Proposal-Myanmar].

Independent Vowels: Only Burmese, Mon and Pa’O languages use Independent Vowels. An Independent Vowel can be at any position in a label. See Section 3.3.2, "The Independent Vowels" of the [Proposal-Myanmar].

Dependent Vowels: Dependent vowel signs add vowel property to consonants. These signs appear in top/below/left/right positions of a center consonant or consonant+medial. The dotted-circle indicates where the centre character would be. Dependent vowel signs cannot be repeated and Dependent vowels cannot be adjacent to each other unless within sequences defined in Table 8-A. Dependent vowels also cannot be followed by Asat (U+1038) unless within sequences defined in Table 8-A. See Section 3.3.3, "The Diacritic - Dependent Vowels" of the [Proposal-Myanmar].

Medials: Medials are used to enhance the sound of Consonants. They are also noted as Dependent Consonants as they need a leading Consonant to attach to. All five languages in this proposal use Medials. See Section 3.3.4, "Diacritic - Medials" of the [Proposal-Myanmar].

Virama: U+1039 MYANMAR SIGN VIRAMA is used in Burmese and Mon. Virama has two properties, as killer (devoweliser) and joiner of syllable chaining. This virama brings the consonant after it to be rendered below the consonant before. Pattern of syllable chaining : Consonant + Virama + Consonant. However. a repetition of Consonant + Virama + Consonant is not allowed to prevent rendering issues. See Section 3.3.5, "Diacritic - Tone Marks and Other Signs" of the [Proposal-Myanmar].

Killer or Asat: U+103A MYANMAR SIGN ASAT is used in Burmese and Mon. This sign is used to remove the consonant sound of a letter and take only the vowel property to create more vowel sounds out of consonants. Except for defined sequences, Asat cannot follow other Diacritics. See Section 3.3.5, "Diacritic - Tone Marks and Other Signs" of the [Proposal-Myanmar].

Long Tone (t_long): U+1038 MYANMAR SIGN VISARGA appears at the end of the syllable. It creates a vowel sound with the higher tone. It follows a Consonant, Medial, Long vowel or the sequence U+102D U+102F. See Section 3.3.5.1, "Burmese Tone Marks and Other Signs" of the [Proposal-Myanmar].

Short Tone (t_short): MYANMAR SIGN DOT BELOW (U+1037) appears at the end of the syllable. It creates a vowel sound with the higher tone. It follows a Consonant, Medial, Long vowel or the sequence U+102D U+102F. See Section 3.3.5.1, "Burmese Tone Marks and Other Signs" of the [Proposal-Myanmar].

Other classes defined for use in WLE and context rules

Shan Tone (sh_tone), Shan Vowel (sh_vowel)
C_103B, a set of consonants that can be followed by medial YA U+103B
C_103C, a set of consonants that can be followed by medial RA U+103C
C_103E, a set of consonants that can be followed by medial HA U+103E
C_n103D, a set of consonants that cannot be followed by medial WA U+103D
C_mon, a set of Mon consonants that can be followed by Mon Medial U+105E U+105F or U+1060
C_shan, a set of Mon consonants that can be followed by Shan medial WA U+1082
CMM1, a set of consonants that can be followed by medial sequence U+103B U+103D
CMM3, a set of consonants that can be followed by medial sequence U+103C U+103D
CMM5, a set of consonants that can be followed by medial sequence U+103D U+103E
cp1002cp1015cp101D, a set of consonants that cannot be followed by vowel AA U+102C

Whole Label Evaluation (WLE) and Context Rules

Default Whole Label Evaluation Rules and Actions

The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-5]. They are marked with ⍟. The default prohibition on leading combining marks is equivalent to ensuring that a label only starts with a consonant or vowel.

Myanmar-specific Rules

There are constraints on the context for many of the character classes in Myanmar. These constraints enforce the syllable structure to the degree needed for stability of rendering (which affects both security and usability) without enforcing other linguistic constraints or spellings. These constraints are implemented via a set of context and whole label rules formulated for LGR specification, as described in Section 7, "Whole Label Evaluation (WLE) Rules" in [Proposal-Myanmar].

The following shorthand names for sequences are used in the description or definition of these rules:

S11 — Myanmar letter NGA followed by Asat and Virama
S12, S14, S15 — long-vowel sequences
S16, S17 — Asat sequences
ST4 — a Pa'O Karen tone sequence
ST1, ST2, ST3 — Skaw Karen tone sequences
S_Mon4 — Mon Kinzi
S_Mon5 - S_Mon16; Mon dependent vowel sequences

The rules are:

dv: must follow c or m;
Note dv includes lv, sv, and ov, as well as the starting code point in sequences S12, S13, S14, S15, S18, S19, S20, S_Mon3, S_Mon5, S_Mon6, S_Mon7, S_Mon8, S_Mon9, S_Mon10, S_Mon11, S_Mon12, S_Mon16, S_Sh1, S_Sh2, S_Sh3, S_Sh4, S_Sh5, S_Pao/.
Anusvara: must follow c or m
Rules for Medials combining with Consonants:
- Rules for Single Medials
  - U+103B must follow consonant C_103B
  - U+103C must follow consonant C_103C
  - U+103E must follow consonant C_103E
  - U+103D must follow any consonant except C_n103D
  - M_mon must follow C_mon
  - M_shan must follow C_shan
- Rules for Combined Medials
  - MM1 must follow CMM1
  - MM3 must follow CMM3
  - MM5 must follow CMM5
  - S_Mon13 must follow C_103E
  - S_Mon14 must follow C_103E
  - S_Mon15 must follow C_103E
U+103F MYANMAR GREAT SAA must follow c or m or dv or U+1023 or U+1025
(c + k) or (c2 + S16) or (c3 + S17) must follow c or n or dv or ov
S11 must follow c or m or dv and another c must follow S11
virama: must be between two c (c+v+c). But c cannot be between v to prevent v+c+v+c
t_long and t_short: must follow c or m or lv or S12
sh_tone must follow sh_vowel or (c + k) or s_sh2 or s_sh5
pao_tone must follow dv or m or k, except U+1037 or U+1308
ST4 must follow U+1031, U+1032, S12, S14, or S15
ST1, ST2, ST3 or Sgaw_Tone must follow c or m or dv
S_Mon4 must follow c or m or dv and another c must follow S_Mon4
U+102C cannot follow any of the three consonants U+1002, U+1015, U+101D

These rules are implemented as required or prohibited contexts for the respective repertoire elements.

No-Mix Rules

According to Table 10 of [Proposal-Myanmar], there are code points which cannot occur both in the same label. The following WLE rules enforce these constraints:

no-mix-mm-i-and-mm-ka-v-ka
no-mix-mm-ha-asat-and-mm-pa-aa-asat
no-mix-sk-sha-and-mm-ra-mha
no-mix-mm-kha-and-shan-kha
no-mix-mm-and-mon — the use of any mon-specific code point requires that all other code points that have a mon-specific equivalent must use that one over the standard Myanmar code point in that label.

These rules are implemented as WLE rules which trigger a corresponding action.

Context rules for Variants

The following prohibited context applies to certain variants:

followed-by-c-end — a variant relation does not exist between 1004 and 105A if followed by a consonant or end of label

Myanmar-specific actions

The no-mix rules trigger Myanmar-specific actions to invalidate any original and variant labels not satisfying the constraints. See Section 7, "Whole Label Evaluation (WLE) Rules" in [Proposal-Myanmar].

In addition, to limit the number of allocatable variant labels, further constraints on variants are included. These allow allocatable variant labels to either contain members of Grapheme-set-1 (left column in Table 10) or Grapheme-set-2 (right column in Table 10) but not a mixture of members from both sets. The code points on the left column are either the simpler forms (shorter sequence) or the lower Unicode code point values. The mixture for code points from different sets, however, is possible in the original, applied-for label. See Section 6.1 "In-script Variants" in [Proposal-Myanmar].

Each code point or sequence in grapheme-set-1 has been given the reflexive variant type "r-set1" and each code point or sequence in grapheme-set-2 has been given the reflexive variant of type "r-set2". (By convention, the prefix “r-“ marks a type used in a reflexive variant mapping, that is, it represents an instance of the original code point at that location in a variant label, see Section 5.3.4 in [RFC 7940].)

A variant mapping from a member of grapheme-set-1 to a member of grapheme-set-2 is of type "set1-to- set2", while the variant type for mapping from grapheme-set-2 to grapheme-set-1 is of type "set2-to-set1".

Script-specific actions evaluate these variant types to ensure the following constraints:

blocked — a variant label containing a blocked variant will receive a disposition of "blocked".
r-set1 r-set2 — a label containing one or more of these reflexive variant types and no others represents an original label and receives a disposition of "allocatable".
r-set1 set2-to-set1 — a label containing one or more of these variant types and no others receives a disposition of "allocatable".
r-set2 set1-to-set2 — a label containing one or more of these variant types and no others receives a disposition of "allocatable".
set1-to-set2 set2-to-set1 — a label containing a mix of these variant types receives a disposition of "blocked".

See Section 6.1 "In-script Variants" and Section 7, "Whole Label Evaluation (WLE) Rules" in [Proposal-Myanmar].

Methodology and Contributors

The Root Zone LGR for the Myanmar script was developed by the Myanmar Generation Panel. For additional detail on methodology and contributors see Sections 4 and 8 in [Proposal-Myanmar], as well as [RZ-LGR-5-Overview].

References

[EGIDS]: Lewis and Simons, “EGIDS: Expanded Graded Intergenerational Disruption Scale,” documented in [SIL-Ethnologue] and summarized here: https://en.wikipedia.org/wiki/Expanded_Graded_Intergenerational_Disruption_Scale_(EGIDS))
[Ethnologue]: Ethnologue, Myanmar, (Accessed 6 October 2019) https://www.ethnologue.com/country/MM
[MSR-5]: Integration Panel, "Maximal Starting Repertoire — MSR-5 Overview and Rationale", 24 June 2021, https://www.icann.org/en/system/files/files/msr-5-overview-24jun21-en.pdf
[Proposal-Myanmar]: Myanmar Generation Panel, “Proposal for a Myanmar Script Root Zone Label Generation Rule-Set (LGR)”, 17 March 2022, https://www.icann.org/en/system/files/files/proposal-myanmar-lgr-17mar22-en.pdf
[RFC 7940]: Davies, K. and A. Freytag, "Representing Label Generation Rulesets Using XML", RFC 7940, August 2016, http://www.rfc-editor.org/info/rfc7940.
[RFC 8228]: A. Freytag, "Guidance on Designing Label Generation Rulesets (LGRs) Supporting Variant Labels", RFC 8228, August 2017, https://www.rfc-editor.org/info/rfc8228
[SIL-Ethnologue]: David M. Eberhard, Gary F. Simons & Charles D. Fennig (eds.). 2021. Ethnologue: Languages of the World, Twenty fourth edition. Dallas, Texas: SIL International. Online version available as http://www.ethnologue.com
[Unicode 11.0]: The Unicode Consortium. The Unicode Standard, Version 11.0.0, (Mountain View, CA: The Unicode Consortium, 2018. ISBN 978-1-936213-19-1) http://www.unicode.org/versions/Unicode11.0.0/

For more details for references [101] and up refer to the Table of References below.

]]> The Unicode Standard 1.1 The Unicode Standard 3.0 The Unicode Standard 5.1 The Unicode Standard 5.2 Section "Burmese", p. 21ff in "Representing Myanmar in Unicode", UTN#11, Details and Examples, Version 4, https://www.unicode.org/notes/tn11/UTN11_4.pdf Section "Mon", p. 31ff in "Representing Myanmar in Unicode", UTN#11, Details and Examples, Version 4, https://www.unicode.org/notes/tn11/UTN11_4.pdf Section "Shan", p. 41ff in "Representing Myanmar in Unicode", UTN#11, Details and Examples, Version 4, https://www.unicode.org/notes/tn11/UTN11_4.pdf Section "Pa'o Karen", p. 37ff in "Representing Myanmar in Unicode", UTN#11, Details and Examples, Version 4, https://www.unicode.org/notes/tn11/UTN11_4.pdf Section "Sgaw Karen", p. 33ff in "Representing Myanmar in Unicode", UTN#11, Details and Examples, Version 4, https://www.unicode.org/notes/tn11/UTN11_4.pdf 1004 1008 102E 1033 105A-105B 1033 105A-105B 1004 1008 102E

1037-1038

1002 1015 101D