This file contains Label Generation Rules (LGR) for the Thai script for the Root zone. For more details on this LGR and its development, as well as background on the script, see "Proposal for a Thai Script Root Zone LGR [Proposal-Thai]". This file is one of a set of LGR files that together form an integrated LGR for the DNS Root Zone [RZ-LGR-4]. The format of this file follows [RFC 7940].
In addition to the 68 single code points according to Section 5 “Repertoire” in [Proposal-Thai], three sequences have been defined. The sequence U+0E4D U+0E32 was defined to replace the disallowed U+0E33 (SARA AM) and to facilitate implementation of WLE rule follows-consonant-tone as a context rule. The other two sequences were defined to restrict U+0E45 (LAKKHANGYAO) from appearing in any context other than these sequences. Accordingly, while U+0E45 is not listed by itself it brings the total of distinct code points to 69.
The repertoire only includes code points used by languages that are actively written in the Thai script. The repertoire is based on [MSR-4], which is a subset of [Unicode 6.3].
As part of the Root Zone, this LGR includes neither digits nor the HYPHEN-MINUS.
Each code point or range is tagged with the script or scripts that the code point is used with, and one or more references documenting sufficient justification for inclusion in the repertoire; see "References" below.
According to Section 6, "Variants" in [Proposal-Thai], this LGR defines no variants.
The Thai Script is an abugida in which consonant–vowel sequences are written as a unit: each unit is based on a consonant letter, and vowel, tone mark or diacritic notation are secondary. It is written with the combining marks stacked above or below the base consonant, like diacritics in European languages. However, although the concepts are quite similar, the implementations are significantly different.
Consonants: There are 44 characters that are classified as consonants; code points from this subset have been given the tag "cons". See Section 5.1, "Consonants" in [Proposal-Thai].
Vowels: The 18 vowel symbols pronounced after a consonant are non-sequential: they can be located before (lv) , after (fv), above (av) or below (bv) the consonant, or in a combination of these positions, code points from this subset have been given the tags "fv1", "fv2", "fv3", "av", "bv", or "lv". There are three code point sequences defined that include vowels. (Code point sequences do not carry tag values; instead, for code point sequences the subset values are identified in comments). See Section 5.2, "Vowels" in [Proposal-Thai].
Tones: There are 5 phonemic tones: mid, low, falling, high, and rising. These 5 tones are represented by 4 tone marks plus the absence of a mark. Code points from this subset have been given the tag "tone". See section 5.3, "Tone Marks" in [Proposal-Thai].
Diacritical Marks: There are 3 diacritic symbols above that have been included here and given the tag "ad". They differ in their frequency and purpose of usage. See also the discussion in Section 5.,4 "Diacritics" in [Proposal-Thai].
A fourth above diacritic, U+0E4E (YAMAKKAN), has been excluded from the Root Zone LGR repertoire because it is rarely used in Modern Thai or even in older Pali manuscripts; it is more common to replace it with U+0E3A (PHINTHU), a below diacritic, which has been given the tag "bd". Moreover, excluding U+0E4E (YAMAKKAN) also eliminates the chance of confusion between U+0E4E (YAMAKKAN) and U+0E4C (THANTHAKHAT). Both look similar, are always placed at the same position in the word cell, and they are normally displayed in a small size.
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟.
The rules provided in this LGR as described in Section 7 of [Proposal-Thai] reasonably restrict labels so that they conform to Thai syllable structure. These constraints are exclusively presented as context rules.
The rules are:
The Root Zone LGR for the Thai script was developed by the Thai Generation Panel. For methodology and contributors, see Sections 4 and 8 in [Proposal-Thai], as well as [RZ-LGR-4-Overview].
The following general references are cited in this document:
For references consulted particularly in designing the repertoire for the Thai script for the Root Zone please see details in the Table of References below. Reference [0] refers to the Unicode Standard version in which the corresponding code points were initially encoded. References [100] and [101] correspond to sources given in [Proposal-Thai] for justifying the inclusion of for the corresponding code points. Entries in the table may have multiple source reference values.
]]>