Root Zone LGR for script: Lao (Laoo) | lgr-4-lao-script-05nov20-en |
---|
This document is mechanically formatted from the above XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.
Date | 2020-11-05 |
---|---|
LGR Version | 4 |
Language | und-Laoo |
Scope | domain: "." (Root) |
Unicode Version | 6.3.0 |
This file contains Label Generation Rules (LGR) for the Lao script for the Root zone. For more details on this LGR and its development, as well as background on the script, see "Proposal for a Lao Script Root Zone LGR [Proposal-Lao]". This file is one of a set of LGR files that together form an integrated LGR for the DNS Root Zone [RZ-LGR-4]. The format of this file follows [RFC 7940].
In addition to the 51 code points according to Section 5 “Repertoire” in [Proposal-Lao], the sequence 0EB2 0EB0 has been defined to facilitate implementation of WLE rule follows-vafter-context as a context rule. The repertoire only includes code points used by languages that are actively written in the Khmer script. The repertoire is based on [MSR-4], which is a subset of [Unicode 6.3].
As part of the Root Zone, this LGR includes neither digits nor the HYPHEN-MINUS.
Each code point or range is tagged with the script or scripts that the code point is used with, one or more categories, and one or more references documenting sufficient justification for inclusion in the repertoire; see "References" below.
According to Section 6, "Variants" in [Proposal-Lao], this LGR defines no variants.
Consonants: In regular syllables, consonants occur in limited combinations. However, arbitrary combinations are used for acronyms. The LGR therefore considers the restriction on syllabic combinations a matter of spelling and does not enforce them. Consonants may be followed by a semi-consonant mark. Some consonants have been given the tag "Cf", which indicates final consonants. See Section 5, "Consonants" in [Proposal-Lao].
Vowels: Vowels are divided into vowel-above, vowel-before, vowel-below and vowel-after so as to enforce some of the syllable structure using context rules. However, many details have been considered spelling issues and, for simplification, are not modeled in this LGR. See Section 5 in [Proposal-Lao].
Semi-consonant: The character U+0EBC (ຼ) LAO SEMIVOWEL SIGN LO follows consonants (see Section 5 in [Proposal-Lao]).
Tone-mark: Any of four tone marks can follow a consonant or vowel-above or vowel-below (see Section 5 in [Proposal-Lao]).
Signs: The character U+0ECC (໌) LAO CANCELLATION MARK follows a final consonant (Cf). The character U+0EC6 (ໆ) LAO KO LA is a repetition mark that can only occur up to 3 times at the end of the label (See Section 5 in [Proposal-Lao]).
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟. The default prohibition on leading combining marks is equivalent to ensuring that a label only starts with a consonant or vowel-before.
Rules provided in the LGR as described in Section 7 of [Proposal-Lao] reasonably restrict labels so that they conform to Lao syllable structure. These constraints are presented exclusively as context rules.
The rules are:
No context rules apply to “consonant” code points. For discussion, see Section 5.1, “Consonants” in [Proposal-Lao].
The Root Zone LGR for the Lao script was developed by the Lao Generation Panel. For methodology and contributors, see Sections 4 and 8 in [Proposal-Lao], as well as [RZ-LGR-4-Overview].
The following general references are cited in this document:
For references consulted particularly in designing the repertoire for the Lao script for the Root Zone please see details in the Table of References below. Reference [0] refers to the Unicode Standard version in which corresponding code points were initially encoded. References [201], [202], [203], [204], 205], [206], & [207] correspond to sources given in [Proposal-Lao] justifying the inclusion of or classification for the corresponding code points. Entries in the table may have multiple source reference values.
Number of elements in Repertoire | 52 |
---|---|
Number of code points | 51 |
Number of sequences | 1 |
Longest code point sequence | 2 |
The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where a comment in the original LGR is equal to the character name, it has been suppressed.
See also the legend provided below the table.
Code Point |
Glyph | Script | Name | Ref | Tags | Required Context | Comment |
---|---|---|---|---|---|---|---|
U+0E81 | ກ | Lao | LAO LETTER KO | [0], [201], [204] | Cf, consonant | Lao | |
U+0E82 | ຂ | Lao | LAO LETTER KHO SUNG | [0], [201], [204] | consonant | Lao | |
U+0E84 | ຄ | Lao | LAO LETTER KHO TAM | [0], [201], [204] | consonant | Lao | |
U+0E87 | ງ | Lao | LAO LETTER NGO | [0], [201], [204] | Cf, consonant | Lao | |
U+0E88 | ຈ | Lao | LAO LETTER CO | [0], [201], [204] | consonant | Lao | |
U+0E8A | ຊ | Lao | LAO LETTER SO TAM | [0], [201], [204] | Cf, consonant | Lao | |
U+0E8D | ຍ | Lao | LAO LETTER NYO | [0], [201], [204] | Cf, consonant | Lao | |
U+0E94 | ດ | Lao | LAO LETTER DO | [0], [201], [204] | Cf, consonant | Lao | |
U+0E95 | ຕ | Lao | LAO LETTER TO | [0], [201], [204] | consonant | Lao | |
U+0E96 | ຖ | Lao | LAO LETTER THO SUNG | [0], [201], [204] | consonant | Lao | |
U+0E97 | ທ | Lao | LAO LETTER THO TAM | [0], [201], [204] | Cf, consonant | Lao | |
U+0E99 | ນ | Lao | LAO LETTER NO | [0], [201], [204] | Cf, consonant | Lao | |
U+0E9A | ບ | Lao | LAO LETTER BO | [0], [201], [204] | Cf, consonant | Lao | |
U+0E9B | ປ | Lao | LAO LETTER PO | [0], [201], [204] | consonant | Lao | |
U+0E9C | ຜ | Lao | LAO LETTER PHO SUNG | [0], [201], [204] | consonant | Lao | |
U+0E9D | ຝ | Lao | LAO LETTER FO FON | [0], [201], [204] | consonant | = lao letter fo sung; Lao | |
U+0E9E | ພ | Lao | LAO LETTER PHO TAM | [0], [201], [204] | consonant | Lao | |
U+0E9F | ຟ | Lao | LAO LETTER FO FAY | [0], [201], [204] | Cf, consonant | = lao letter fo tam; Lao | |
U+0EA1 | ມ | Lao | LAO LETTER MO | [0], [201], [204] | Cf, consonant | Lao | |
U+0EA2 | ຢ | Lao | LAO LETTER YO | [0], [201], [204] | consonant | Lao | |
U+0EA3 | ຣ | Lao | LAO LETTER RO | [0], [204] | Cf, consonant | = lao letter lo rada; Lao | |
U+0EA5 | ລ | Lao | LAO LETTER LO | [0], [201], [204] | Cf, consonant | = lao letter lo ling; Lao | |
U+0EA7 | ວ | Lao | LAO LETTER WO | [0], [201], [204], [205] | Cf, consonant | Lao | |
U+0EAA | ສ | Lao | LAO LETTER SO SUNG | [0], [201], [204] | Cf, consonant | Lao | |
U+0EAB | ຫ | Lao | LAO LETTER HO SUNG | [0], [201], [204] | consonant | Lao | |
U+0EAD | ອ | Lao | LAO LETTER O | [0], [201], [204], [205] | consonant | Lao | |
U+0EAE | ຮ | Lao | LAO LETTER HO TAM | [0], [201], [204] | consonant | Lao | |
U+0EB0 | ະ | Lao | LAO VOWEL SIGN A | [0], [201], [205], [206] | vowel-after | follows-C-tonemark-vabove | Lao |
U+0EB1 | ັ | Lao | LAO VOWEL SIGN MAI KAN | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB2 | າ | Lao | LAO VOWEL SIGN AA | [0], [201], [205], [206] | vowel-after | follows-C-tonemark-vabove | Lao |
U+0EB2 U+0EB0 | າະ | {Lao} | LAO VOWEL SIGN AA + LAO VOWEL SIGN A | [205] | [vowel-after] + [vowel-after] | follows-vbefore-consonant-cluster | Lao |
U+0EB4 | ິ | Lao | LAO VOWEL SIGN I | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB5 | ີ | Lao | LAO VOWEL SIGN II | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB6 | ຶ | Lao | LAO VOWEL SIGN Y | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB7 | ື | Lao | LAO VOWEL SIGN YY | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB8 | ຸ | Lao | LAO VOWEL SIGN U | [0], [201], [205], [206] | vowel-below | follows-main-consonant | Lao |
U+0EB9 | ູ | Lao | LAO VOWEL SIGN UU | [0], [201], [205], [206] | vowel-below | follows-main-consonant | Lao |
U+0EBB | ົ | Lao | LAO VOWEL SIGN MAI KON | [0], [205] | vowel-above | follows-main-consonant | Lao |
U+0EBC | ຼ | Lao | LAO SEMIVOWEL SIGN LO | [0], [201], [205], [206] | semi-consonant | follows-consonant | = lao semiconsonant lo; Lao |
U+0EBD | ຽ | Lao | LAO SEMIVOWEL SIGN NYO | [0], [201], [205] | vowel-after | follows-C-tonemark-vabove | = lao semivowel ia; Lao |
U+0EC0 | ເ | Lao | LAO VOWEL SIGN E | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC1 | ແ | Lao | LAO VOWEL SIGN EI | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC2 | ໂ | Lao | LAO VOWEL SIGN O | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC3 | ໃ | Lao | LAO VOWEL SIGN AY | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC4 | ໄ | Lao | LAO VOWEL SIGN AI | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC6 | ໆ | Lao | LAO KO LA | [0], [203] | sign | repetition-mark-limit | = lao may sam; Lao |
U+0EC8 | ່ | Lao | LAO TONE MAI EK | [0], [202] | tone-mark | follows-C-vabove-vbelow | Lao |
U+0EC9 | ້ | Lao | LAO TONE MAI THO | [0], [202] | tone-mark | follows-C-vabove-vbelow | Lao |
U+0ECA | ໊ | Lao | LAO TONE MAI TI | [0], [202] | tone-mark | follows-C-vabove-vbelow | Lao |
U+0ECB | ໋ | Lao | LAO TONE MAI CATAWA | [0], [202] | tone-mark | follows-C-vabove-vbelow | = lao tone mai jattawa; Lao |
U+0ECC | ໌ | Lao | LAO CANCELLATION MARK | [0], [207] | sign | follows-Cf | = lao mark mai ka lan; Lao |
U+0ECD | ໍ | Lao | LAO NIGGAHITA | [0], [201], [205], [206] | vowel-above | follows-main-consonant | = lao vowel sign or; Lao |
Legend
Throughout this LGR, a code point sequence may be annotated with a string in ALL CAPS that is constructed on the same principle as a name for a Unicode Named Sequence. No claim is made that a sequence thus annotated is in fact a named sequence, nor that the annotation in such case actually corresponds to the formal name of a named sequence.
This LGR does not specify any variants.
The following table lists all named and implicit classes with their definition and a list of their members intersected with the current repertoire (for larger classes, this list is elided).
Name | Definition | Count | Members or Ranges | Ref | Comment |
---|---|---|---|---|---|
Cf | Tag=Cf | 14 | {0E81 0E87 0E8A 0E8D 0E94 0E97 0E99-0E9A 0E9F 0EA1 0EA3 0EA5 0EA7 0EAA} | Any Lao final consonant | |
consonant | Tag=consonant | 27 | {0E81-0E82 0E84 0E87-0E88 0E8A 0E8D 0E94-0E97 0E99-0E9F 0EA1-0EA3 0EA5 0EA7 0EAA-0EAB 0EAD-0EAE} | Any Lao consonant | |
semi-consonant | Tag=semi-consonant | 1 | {0EBC} | Lao semi-consonant LO | |
tone-mark | Tag=tone-mark | 4 | {0EC8-0ECB} | Any Lao one mark | |
vowel-above | Tag=vowel-above | 7 | {0EB1 0EB4-0EB7 0EBB 0ECD} | Any Lao vowel above | |
vowel-below | Tag=vowel-below | 2 | {0EB8-0EB9} | Any Lao vowel below | |
implicit | Tag=sign | 2 | {0EC6 0ECC} | Any character tagged as sign | |
implicit | Tag=vowel-after | 3 | {0EB0 0EB2 0EBD} | Any character tagged as vowel-after | |
implicit | Tag=vowel-before | 5 | {0EC0-0EC4} | Any character tagged as vowel-before | |
implicit | Tag=sc:Laoo | 51 | {0E81-0E82 0E84 0E87-0E88 0E8A 0E8D 0E94-0E97 0E99-0E9F 0EA1-0EA3 0EA5 0EA7 0EAA-0EAB 0EAD-0EAE 0EB0-0EB2 0EB4-0EB9 0EBB-0EBD 0EC0-0EC4 0EC6 0EC8-0ECD} | Any character tagged as Lao |
Legend
The following table lists all named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point or variant.
Name | Regular Expression | Used as Trigger |
Anchor | Used as Context |
Ref | Comment |
---|---|---|---|---|---|---|
leading-combining-mark | (start)[[\p{gc=Mn}] ∪ [∅=\p{gc=Mc}]] |
✔ | Default WLE rule matching labels with leading combining marks ⍟ | |||
follows-consonant | ([:consonant:])← ⚓ |
✔ | C | WLE Rule 1: A semi-consonant must follow a consonant | ||
precedes-consonant | ⚓ →([:consonant:]) |
✔ | C | WLE Rule 2: A vowel-before precedes a main consonant cluster | ||
follows-main-consonant | ([:consonant:]|[:semi-consonant:])← ⚓ |
✔ | C | WLE Rule 3: A vowel-above, and vowel-below follow a main consonant C | ||
follows-C-tonemark-vabove | ([:consonant:]|[:semi-consonant:]|[:tone-mark:]|[:vowel-above:])← ⚓ |
✔ | C | WLE Rule 4: A vowel-after follows a main consonant, tone-mark or vowel-above | ||
consonant-cluster | [:consonant:]{1,2}[:semi-consonant:]? |
Defining consonant cluster for WLE Rule 5 | ||||
follows-vbefore-consonant-cluster | (\u0EC0(:consonant-cluster:))← ⚓ |
✔ | C | WLE Rule 5: The sequence U+0EB2 U+0EB0 (າະ) follows a vowel before, and a consonant cluster | ||
follows-C-vabove-vbelow | ([:consonant:]|[:semi-consonant:]|[:vowel-above:]|[:vowel-below:])← ⚓ |
✔ | C | WLE Rule 6: A tone-mark follows a main consonant, vowel-above or vowel-below | ||
follows-Cf | ([:Cf:])← ⚓ |
✔ | C | WLE Rule 7: The sign U+0ECC (໌) can only occur after final consonants | ||
repetition-mark-limit | ⚓ →(\u0EC6{0,2}(end)) |
✔ | C | WLE Rule 8: The sign U+0EC6 (ໆ) can only occur 0 to 3 times at the end of the label |
Legend
The following table lists the actions that are used to assign dispositions to labels and variant labels based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.
# | Condition | Rule / Variant Set | Disposition | Ref | Comment | |
---|---|---|---|---|---|---|
1 | if label matches | leading-combining-mark | → | invalid | labels with leading combining marks are invalid ⍟ | |
2 | if at least one variant is in | {out-of-repertoire-var} | → | invalid | any variant label with a code point out of repertoire is invalid ⍟ | |
3 | if at least one variant is in | {blocked} | → | blocked | any variant label containing blocked variants is blocked ⍟ | |
4 | if each variant is in | {allocatable} | → | allocatable | variant labels with all variants allocatable are allocatable ⍟ | |
5 | if any label (catch-all) | → | valid | catch all (default action) ⍟ |
Legend
The following lists the references cited for specific code points, variants, classes, rules or actions in this LGR. For General references refer to the "References" section in the Description.
[0] | The Unicode Standard 1.1 Any code point originally encoded in Unicode 1.1 |
[201] | Lao grammar book published by the Ministry of Education in 1967,
see Appendix B, Figure 1 in [Proposal-Lao] |
[202] | Lao grammar book published by the Ministry of Education in 1967,
see Appendix B, Figure 2 in [Proposal-Lao] |
[203] | Lao grammar book published by the Ministry of Education in 1967,
see Appendix B, Figure 3 in [Proposal-Lao] |
[204] | Lao grammar book published by the Ministry of Education in 2000,
see Appendix B, Figure 4 in [Proposal-Lao] |
[205] | Lao grammar book published by the Ministry of Education in 2000,
see Appendix B, Figure 5 in [Proposal-Lao] |
[206] | Lao grammar book published by the Ministry of Education in 2000,
see Appendix B, Figure 6 in [Proposal-Lao] |
[207] | Lao grammar 1935, see Appendix B, Figure 7 in [Proposal-Lao] |