This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.
Date | 2017-06-01 |
---|---|
LGR Version | 2 |
Language | und-Laoo |
Scope | domain: "." (Root) |
Unicode Version | 6.3.0 |
This file contains Label Generation Rules (LGR) for the Lao script as would be appropriate for the Root zone. For more details on this LGR, see "Proposal for a Lao Script Root Zone LGR [Proposal]". The format of this file follows [RFC 7940].
In addition to the 51 code points according to Section 5 “Repertoire” in [Proposal], the sequence 0EB2 0EB0 has been defined to facilitate implementation of WLE rule follows-vafter-context as a context rule. The repertoire only includes code points used by languages that are actively written in the Khmer script. The repertoire is based on [MSR-2], which is a subset of Unicode 6.3 [Unicode 6.3].
Each code point or range is tagged with the script or scripts that the code point is used with, and one or more references documenting sufficient justification for inclusion in the repertoire, see "References" below.
This LGR defines no variants.
Some consonants have been given the tag of Cf, which indicates final consonants. Other character classes that have been used are semi-consonant, tone-mark, vowel-above, vowel-before, vowel-below and vowel-after. See Section 5 of the [Proposal].
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-2]. They are marked with ⍟. The default prohibition on leading combining marks is equivalent to ensuring that a label only starts with a consonant or vowel-before.
Rules provided in the LGR as described in Section 7 of [Proposal] reasonably restrict labels so that they conform to Lao syllable structure. These constraints are presented exclusively as context rules.
The rules are:
No context rules apply to “consonant” code points. For discussion, see Section 5.1 “Consonants” in [Proposal].
For methodology and contributors, see Sections 4 and 8 of [Proposal].
Reference value ("ref" attribute) 0 refers to Unicode Standard versions in which corresponding code points were initially encoded. Reference values 201, 202, 203, 204, 205, 206, & 207 correspond to sources justifying the inclusion of or classification for the corresponding code points. Single code points or ranges may have multiple source reference values.
Reference values ("ref" attribute") from 201 and up refer to specific sources cited for the corresponding code points in the "[Proposal]".
In addition, the following references are cited in this document:
For more details for references, refer to the Table of References below. Several of these references refer to a figure in an appendix of the [Proposal] document.
Number of elements in Repertoire | 52 |
---|---|
Longest code point sequence | 2 |
Number of code points | 51 |
Number of sequences | 1 |
The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where a comment in the original LGR is equal to the character name, it has been suppressed.
See also the legend provided below the table.
Code Point |
Glyph | Script | Name | References | Tags | Required Context | Comment |
---|---|---|---|---|---|---|---|
U+0E81 | ກ | Lao | LAO LETTER KO | [0], [201], [204] | Cf consonant | Lao | |
U+0E82 | ຂ | Lao | LAO LETTER KHO SUNG | [0], [201], [204] | consonant | Lao | |
U+0E84 | ຄ | Lao | LAO LETTER KHO TAM | [0], [201], [204] | consonant | Lao | |
U+0E87 | ງ | Lao | LAO LETTER NGO | [0], [201], [204] | Cf consonant | Lao | |
U+0E88 | ຈ | Lao | LAO LETTER CO | [0], [201], [204] | consonant | Lao | |
U+0E8A | ຊ | Lao | LAO LETTER SO TAM | [0], [201], [204] | Cf consonant | Lao | |
U+0E8D | ຍ | Lao | LAO LETTER NYO | [0], [201], [204] | Cf consonant | Lao | |
U+0E94 | ດ | Lao | LAO LETTER DO | [0], [201], [204] | Cf consonant | Lao | |
U+0E95 | ຕ | Lao | LAO LETTER TO | [0], [201], [204] | consonant | Lao | |
U+0E96 | ຖ | Lao | LAO LETTER THO SUNG | [0], [201], [204] | consonant | Lao | |
U+0E97 | ທ | Lao | LAO LETTER THO TAM | [0], [201], [204] | Cf consonant | Lao | |
U+0E99 | ນ | Lao | LAO LETTER NO | [0], [201], [204] | Cf consonant | Lao | |
U+0E9A | ບ | Lao | LAO LETTER BO | [0], [201], [204] | Cf consonant | Lao | |
U+0E9B | ປ | Lao | LAO LETTER PO | [0], [201], [204] | consonant | Lao | |
U+0E9C | ຜ | Lao | LAO LETTER PHO SUNG | [0], [201], [204] | consonant | Lao | |
U+0E9D | ຝ | Lao | LAO LETTER FO TAM | [0], [201], [204] | consonant | Lao | |
U+0E9E | ພ | Lao | LAO LETTER PHO TAM | [0], [201], [204] | consonant | Lao | |
U+0E9F | ຟ | Lao | LAO LETTER FO SUNG | [0], [201], [204] | Cf consonant | Lao | |
U+0EA1 | ມ | Lao | LAO LETTER MO | [0], [201], [204] | Cf consonant | Lao | |
U+0EA2 | ຢ | Lao | LAO LETTER YO | [0], [201], [204] | consonant | Lao | |
U+0EA3 | ຣ | Lao | LAO LETTER LO LING | [0], [204] | Cf consonant | Lao | |
U+0EA5 | ລ | Lao | LAO LETTER LO LOOT | [0], [201], [204] | Cf consonant | Lao | |
U+0EA7 | ວ | Lao | LAO LETTER WO | [0], [201], [204], [205] | Cf consonant | Lao | |
U+0EAA | ສ | Lao | LAO LETTER SO SUNG | [0], [201], [204] | Cf consonant | Lao | |
U+0EAB | ຫ | Lao | LAO LETTER HO SUNG | [0], [201], [204] | consonant | Lao | |
U+0EAD | ອ | Lao | LAO LETTER O | [0], [201], [204], [205] | consonant | Lao | |
U+0EAE | ຮ | Lao | LAO LETTER HO TAM | [0], [201], [204] | consonant | Lao | |
U+0EB0 | ະ | Lao | LAO VOWEL SIGN A | [0], [201], [205], [206] | vowel-after | follows-C-tonemark-vabove | Lao |
U+0EB1 | ັ | Lao | LAO VOWEL SIGN MAI KAN | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB2 | າ | Lao | LAO VOWEL SIGN AA | [0], [201], [205], [206] | vowel-after | follows-C-tonemark-vabove | Lao |
U+0EB2 U+0EB0 | າະ | [205] | follows-vbefore-consonant-cluster | Lao | |||
U+0EB4 | ິ | Lao | LAO VOWEL SIGN I | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB5 | ີ | Lao | LAO VOWEL SIGN II | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB6 | ຶ | Lao | LAO VOWEL SIGN Y | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB7 | ື | Lao | LAO VOWEL SIGN YY | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
U+0EB8 | ຸ | Lao | LAO VOWEL SIGN U | [0], [201], [205], [206] | vowel-below | follows-main-consonant | Lao |
U+0EB9 | ູ | Lao | LAO VOWEL SIGN UU | [0], [201], [205], [206] | vowel-below | follows-main-consonant | Lao |
U+0EBB | ົ | Lao | LAO VOWEL SIGN MAI KON | [0], [205] | vowel-above | follows-main-consonant | Lao |
U+0EBC | ຼ | Lao | LAO SEMIVOWEL SIGN LO | [0], [201], [205], [206] | semi-consonant | follows-consonant | Lao |
U+0EBD | ຽ | Lao | LAO SEMIVOWEL SIGN NYO | [0], [201], [205] | vowel-after | follows-C-tonemark-vabove | Lao |
U+0EC0 | ເ | Lao | LAO VOWEL SIGN E | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC1 | ແ | Lao | LAO VOWEL SIGN EI | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC2 | ໂ | Lao | LAO VOWEL SIGN O | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC3 | ໃ | Lao | LAO VOWEL SIGN AY | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC4 | ໄ | Lao | LAO VOWEL SIGN AI | [0], [201], [205], [206] | vowel-before | precedes-consonant | Lao |
U+0EC6 | ໆ | Lao | LAO KO LA | [0], [203] | sign | repetition-mark-limit | LAO MAY SAM |
U+0EC8 | ່ | Lao | LAO TONE MAI EK | [0], [202] | tone-mark | follows-C-vabove-vbelow | Lao |
U+0EC9 | ້ | Lao | LAO TONE MAI THO | [0], [202] | tone-mark | follows-C-vabove-vbelow | Lao |
U+0ECA | ໊ | Lao | LAO TONE MAI TI | [0], [202] | tone-mark | follows-C-vabove-vbelow | LAO TONE MAI JATTAWA |
U+0ECB | ໋ | Lao | LAO TONE MAI CATAWA | [0], [202] | tone-mark | follows-C-vabove-vbelow | LAO MARK MAI KA LAN |
U+0ECC | ໌ | Lao | LAO CANCELLATION MARK | [0], [207] | sign | follows-Cf | LAO VOWEL SIGN OR |
U+0ECD | ໍ | Lao | LAO NIGGAHITA | [0], [201], [205], [206] | vowel-above | follows-main-consonant | Lao |
Legend
Throughout this LGR, a code point sequence may be annotated with a string in ALL CAPS that is constructed on the same principle as a name for a Unicode Named Sequence. No claim is made that a sequence thus annotated is in fact a named sequence, nor that the annotation in such case actually corresponds to the formal name of a named sequence.
This LGR does not specify any variants.
The following table lists all named and implicit classes with their definition and a list of their members intersected with the current repertoire (for larger classes, this list is elided).
Name | Definition | Count | Members or Ranges | Ref | Comment |
---|---|---|---|---|---|
Cf | Tag=Cf | 14 Elements: | {0E81 0E87 0E8A 0E8D 0E94 0E97 0E99-0E9A 0E9F 0EA1 0EA3 0EA5 0EA7 0EAA} | ||
consonant | Tag=consonant | 27 Elements: | {0E81-0E82 0E84 0E87-0E88 0E8A 0E8D 0E94-0E97 0E99-0E9F 0EA1-0EA3 0EA5 0EA7 0EAA-0EAB 0EAD-0EAE} | ||
semi-consonant | Tag=semi-consonant | 1 Elements: | {0EBC} | ||
tone-mark | Tag=tone-mark | 4 Elements: | {0EC8-0ECB} | ||
vowel-above | Tag=vowel-above | 7 Elements: | {0EB1 0EB4-0EB7 0EBB 0ECD} | ||
vowel-below | Tag=vowel-below | 2 Elements: | {0EB8-0EB9} | ||
implicit | Tag=vowel-after | 3 Elements: | {0EB0 0EB2 0EBD} | ||
implicit | Tag=vowel-before | 5 Elements: | {0EC0-0EC4} | ||
implicit | Tag=sign | 2 Elements: | {0EC6 0ECC} | ||
implicit | Tag=sc:Laoo | 51 Elements: | {0E81-0E82 0E84 0E87-0E88 0E8A 0E8D 0E94-0E97 0E99-0E9F 0EA1-0EA3 0EA5 0EA7 0EAA-0EAB 0EAD-0EAE ...} |
Legend
The following table lists all named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point. (Any use of context rules for variants is not indicated).
Name | Used as Trigger |
Used as Context |
Anchor | Regular Expression | Ref | Comment |
---|---|---|---|---|---|---|
leading-combining-mark | ✔ | (^[[\p{gc=Mn}][∅=\p{gc=Mc}]]) |
Default WLE rule from MSR-2 matching labels with leading combining marks ⍟ | |||
follows-consonant | ✔ | ✔ | ((?<=[:consonant:])⚓) |
WLE Rule 1: A semi-consonant must follow a consonant | ||
precedes-consonant | ✔ | ✔ | (⚓(?=[:consonant:])) |
WLE Rule 2: A vowel-before precedes a main consonant cluster | ||
follows-main-consonant | ✔ | ✔ | ((?<=([:consonant:]|[:semi-consonant:]))⚓) |
WLE Rule 3: A vowel-above, and vowel-below follow a main consonant C | ||
follows-C-tonemark-vabove | ✔ | ✔ | ((?<=([:consonant:]|[:semi-consonant:]|[:tone-mark:]|[:vowel-above:]))⚓) |
WLE Rule 4: A vowel-after follows a main consonant, tone-mark or vowel-above | ||
consonant-cluster | ([:consonant:]{1,2}[:semi-consonant:]?) |
Defining consonant cluster for WLE Rule 5 | ||||
follows-vbefore-consonant-cluster | ✔ | ✔ | ((?<=\u0EC0(:consonant-cluster:))⚓) |
WLE Rule 5: The sequence (0EB2 0EB0) follows a vowel before, and a consonant cluster | ||
follows-C-vabove-vbelow | ✔ | ✔ | ((?<=([:consonant:]|[:semi-consonant:]|[:vowel-above:]|[:vowel-below:]))⚓) |
WLE Rule 6: A tone-mark follows a main consonant, vowel-above or vowel-below | ||
follows-Cf | ✔ | ✔ | ((?<=[:Cf:])⚓) |
WLE Rule 7: The sign 0ECC can only occur after final consonants | ||
repetition-mark-limit | ✔ | ✔ | (⚓(?=\u0EC6{0,2}$)) |
WLE Rule 8: The sign 0EC6 can only occur 0 to 3 times at the end of the label |
Legend
The following table lists the actions that are used to assign dispositions to labels and variant labels, based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.
# | Condition | Rule / Variant Set | Disposition | Ref | Comment | |
---|---|---|---|---|---|---|
1 | if label matches | leading-combining-mark | → | invalid | labels with leading combining marks are invalid ⍟ | |
2 | if at least one variant is in | {out-of-repertoire-var} | → | invalid | any variant label with a code point out of repertoire is invalid ⍟ | |
3 | if at least one variant is in | {blocked} | → | blocked | any variant label containing blocked variants is blocked ⍟ | |
4 | if each variant is in | {allocatable} | → | allocatable | variant labels with all variants allocatable are allocatable ⍟ | |
5 | if any label (catch-all) | → | valid | catch all (default action) ⍟ |
Legend
Note: The following variant types are used in one or more actions, but are not defined in this LGR: out-of-repertoire-var, blocked, allocatable. This is not necessarily an error.
[0] | The Unicode Standard 1.1, The Unicode Consortium, Mountain View, CA.
1993 Any code point originally encoded in Unicode 1.1 |
[201] | Lao grammar book published by the Ministry of Education in 1967,
see Appendix B, Figure 1 |
[202] | Lao grammar book published by the Ministry of Education in 1967,
see Appendix B, Figure 2 |
[203] | Lao grammar book published by the Ministry of Education in 1967,
see Appendix B, Figure 3 |
[204] | Lao grammar book published by the Ministry of Education in 2000,
see Appendix B, Figure 4 |
[205] | Lao grammar book published by the Ministry of Education in 2000,
see Appendix B, Figure 5 |
[206] | Lao grammar book published by the Ministry of Education in 2000,
see Appendix B, Figure 6 |
[207] | Lao grammar 1935, see Appendix B, Figure 7 |