﻿<?xml version="1.0" encoding="utf-8"?>
<lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
  <meta>
    <version comment="Thai Script Root Zone LGR Version 6.9">2</version>
    <date>2017-05-25</date>
    <language>und-Thai</language>
    <scope type="domain">.</scope>
    <unicode-version>6.3.0</unicode-version>
    <description type="text/html"><![CDATA[

<h1>Label Generation Rules for the Thai Script</h1>
<h2>Overview</h2>
<p>This file contains Label Generation Rules (LGR) for the Thai script as would be appropriate for the Root zone. For more details on this LGR see "Proposal for a Thai Script Root Zone LGR [Proposal]" </p>
<h2>Repertoire</h2>
<p> In addition to the 68 code points according to Section 5 “Repertoire” in [Proposal], three sequences have been defined. The sequence  U+0E4D U+0E32 was defined to replace the disallowed U+0E33 (THAI CHARACTER SARA AM) and to facilitate implementation of WLE rule <b>follows-consonant-tone</b> as a context rule. The other two sequences were   defined to restrict U+0E45 (THAI CHARACTER LAKKHANGYAO) from appearing in any context  other than these sequences. Accordingly, while U+0E45 is not listed by itself it brings the total of   distinct code points to 69.</p>
<h2>Variants</h2>
<p>According to Section 6 "Variants", in "[Proposal]", this LGR defines no variants.</p>
<h2>Character Classes</h2>
<p>The Thai Script is an abugida in which consonant–vowel sequences are written as a unit: each unit is based on a consonant letter, and vowel, tone mark or diacritic notation are secondary.  It is written with the combining marks stacked above or below the base consonant, like diacritics in European languages. However, although the concepts are quite similar, the implementations are significantly different.</p>
<p>There are 44 characters that are classified as consonants, code points from this subset have been given the tag "cons". </p>
<p>The 18 vowel symbols pronounced after a consonant are non-sequential: they can be located before (lv) , after (fv), above (av) or below (bv) the consonant, or in a combination of these positions, code points from this subset have been given the tag "fv1","fv2","fv3","av","bv","lv". There are three code point sequences defined that include vowels. (Code point sequences do not carry tag values; instead, for code point sequences the subset values are indentified in comments).</p>
<p>There are 5 phonemic tones: mid, low, falling, high, and rising. These 5 tones are represented by 4 tone marks plus the absence of a mark. Code points from this subset have been given the tag "tone"</p>
<p>There are 3 diacritic symbols that have been included here and given the tag "ad". They differ in their frequency and purpose of usage. See also the discussion in section 5.4 in [Proposal].</p>
<ul>
    <li>U+0E47 (MAITAIKHU) and U+0E4C (THANTHAKHAT) are commonly used in everyday communicating words</li>
    <li>U+0E4D (NIKHAHIT) is included because of its use to decompose U+0E33 (SARA AM, ําา)  which is in common use, but NIKHAHIT may also be used by itself.</li>
</ul>
    <p>The Thai GP decided to exclude a fourth above diacritic, U+0E4E (YAMAKKAN), from the LGR repertoire because it is rarely used in Modern Thai or even in older Pali manuscripts; it is more common to replace it with U+0E3A (PHINTHU). Moreover, excluding U+0E4E (YAMAKKAN) also eliminates the chance of confusion between U+0E4E (YAMAKKAN) and U+0E4C (THANTHAKHAT). Both look similar, are always placed at the same position in the word cell, and they are normally displayed in a small size.</li>
</ul>
<h2>Whole Label Evaluation (WLE) and Context Rules</h2>
            <h3>Default Whole Label Evaluation Rules</h3>
            <p>The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-2].</p> 
      <h3>Thai specific Rules</h3>
<p>The rules provided in this LGR as described in Section 7 of [Proposal] reasonably restrict labels so that they conform to Thai syllable structure. 
These constraints are exclusively presented as context rules. </p>
<p>The rules are: </p>
<ul>
  <li><b>A leading-vowel must precede a consonant</b> - See Section 7.2 in [Proposal]</li>
  <li><b>A below-vowel must follow a consonant</b> - See Section 7.3 in [Proposal]</li>
  <li><b>An above-vowel must follow a consonant</b> - See Section 7.3 in [Proposal]</li>
  <li><b>A below diacritic must follow a consonant</b> - See Section 7.3 in [Proposal]</li>
  <li><b>An above-diacritic-maitaikhu must follow a consonant</b> - See Section 7.3 in [Proposal]</li>
  <li><b>A vowel Mai Han Akat must be in between a consonant and either tone or consonant</b> - See Section 7.4 in [Proposal]</li>
  <li><b>A vowel Sara A can follow a consonant, a tone or a vowel Sara Aa</b> - See Section 7.5 in [Proposal]</li>
  <li><b>A vowel Sara-Aa, or an above diacritic Nikhahit followed by a vowel Sara-Aa can follow a consonant or a tone</b> - See Sections 7.6 and 7.9 in [Proposal]</li>
  <li><b>A tone-mark, THANTHAKHAT, NIKAHIT can only follow a consonant, above-vowel or below-vowel</b> - See section 7.7 and 7.8 in [Proposal]</li>
</ul>
<h2>Methodology and Contributors</h2>
         <p>For methodology and contributors, see Sections 4 and 8 of [Proposal].</p>
 
         <h2>References</h2> 
         <p>Reference [0] refers to the Unicode Standard version
            in which corresponding code points were initially encoded. Reference [100] corresponds to a source given in [Proposal] for justifying the inclusion of for the corresponding 
            code points. Single code point or ranges may have
            multiple source reference values.</p>
 
         <p>In addition the following references are cited in this document:</p>
         <dl class="references">
           <dt>[MSR-2]</dt>
           <dd>Integration Panel, "Maximal Starting Repertoire — MSR-2 Overview and Rationale", 14 April 2015
              https://www.icann.org/en/system/files/files/msr-2-overview-14apr15-en.pdf</dd>
           <dt>[Proposal]</dt>
           <dd><i>Proposal for a Thai Script Root Zone LGR</i>, 15 December 2016,
              https://www.icann.org/en/system/files/files/proposal-thai-lgr-15dec16-en.pdf</dd> 
           <dt>[Unicode 6.3]</dt>
          <dd>The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5) 
             http://www.unicode.org/versions/Unicode6.3.0/</dd>
          </dl>
          <p>For more details for references [100] and up and [0] and up refer to the <a href="#table_of_references">Table of References</a> below.</p>
]]></description>
    <references>
      <reference id="0">The Unicode Standard 1.1</reference>
      <reference id="100">Thai Industrial Standard (TIS) 1566-2541(1988) (http://www.ratchakitcha.soc.go.th/DATA/PDF/2542/E/088/9.PDF)</reference>
      <reference id="101">Computers and the Thai Language (http://lexitron.nectec.or.th/KM_HL5001/file_HL5001/Paper/Inter%20Journal/krrn_52085.pdf)</reference>
    </references>
  </meta>
  <data>
    <char cp="0E01" tag="cons" ref="0 100 101" />
    <char cp="0E02" tag="cons" ref="0 100 101" />
    <char cp="0E03" tag="cons" ref="0 100 101" />
    <char cp="0E04" tag="cons" ref="0 100 101" />
    <char cp="0E05" tag="cons" ref="0 100 101" />
    <char cp="0E06" tag="cons" ref="0 100 101" />
    <char cp="0E07" tag="cons" ref="0 100 101" />
    <char cp="0E08" tag="cons" ref="0 100 101" />
    <char cp="0E09" tag="cons" ref="0 100 101" />
    <char cp="0E0A" tag="cons" ref="0 100 101" />
    <char cp="0E0B" tag="cons" ref="0 100 101" />
    <char cp="0E0C" tag="cons" ref="0 100 101" />
    <char cp="0E0D" tag="cons" ref="0 100 101" />
    <char cp="0E0E" tag="cons" ref="0 100 101" />
    <char cp="0E0F" tag="cons" ref="0 100 101" />
    <char cp="0E10" tag="cons" ref="0 100 101" />
    <char cp="0E11" tag="cons" ref="0 100 101" />
    <char cp="0E12" tag="cons" ref="0 100 101" />
    <char cp="0E13" tag="cons" ref="0 100 101" />
    <char cp="0E14" tag="cons" ref="0 100 101" />
    <char cp="0E15" tag="cons" ref="0 100 101" />
    <char cp="0E16" tag="cons" ref="0 100 101" />
    <char cp="0E17" tag="cons" ref="0 100 101" />
    <char cp="0E18" tag="cons" ref="0 100 101" />
    <char cp="0E19" tag="cons" ref="0 100 101" />
    <char cp="0E1A" tag="cons" ref="0 100 101" />
    <char cp="0E1B" tag="cons" ref="0 100 101" />
    <char cp="0E1C" tag="cons" ref="0 100 101" />
    <char cp="0E1D" tag="cons" ref="0 100 101" />
    <char cp="0E1E" tag="cons" ref="0 100 101" />
    <char cp="0E1F" tag="cons" ref="0 100 101" />
    <char cp="0E20" tag="cons" ref="0 100 101" />
    <char cp="0E21" tag="cons" ref="0 100 101" />
    <char cp="0E22" tag="cons" ref="0 100 101" />
    <char cp="0E23" tag="cons" ref="0 100 101" />
    <char cp="0E24" tag="fv3" ref="0 100 101" />
    <char cp="0E24 0E45" ref="0 100 101" comment="fv2" />
    <char cp="0E25" tag="cons" ref="0 100 101" />
    <char cp="0E26" tag="fv3" ref="0 100 101" />
    <char cp="0E26 0E45" ref="0 100 101" comment="fv2" />
    <char cp="0E27" tag="cons" ref="0 100 101" />
    <char cp="0E28" tag="cons" ref="0 100 101" />
    <char cp="0E29" tag="cons" ref="0 100 101" />
    <char cp="0E2A" tag="cons" ref="0 100 101" />
    <char cp="0E2B" tag="cons" ref="0 100 101" />
    <char cp="0E2C" tag="cons" ref="0 100 101" />
    <char cp="0E2D" tag="cons" ref="0 100 101" />
    <char cp="0E2E" tag="cons" ref="0 100 101" />
    <char cp="0E30" when="follow-consonant-tone-sara-aa" tag="fv1" ref="0 100 101" />
    <char cp="0E31" when="between-consonant-and-ct" tag="av" ref="0 100 101" />
    <char cp="0E32" when="follows-consonant-tone" tag="fv1 sara-aa" ref="0 100 101" />
    <char cp="0E34" when="follows-consonant" tag="av" ref="0 100 101" />
    <char cp="0E35" when="follows-consonant" tag="av" ref="0 100 101" />
    <char cp="0E36" when="follows-consonant" tag="av" ref="0 100 101" />
    <char cp="0E37" when="follows-consonant" tag="av" ref="0 100 101" />
    <char cp="0E38" when="follows-consonant" tag="bv" ref="0 100 101" />
    <char cp="0E39" when="follows-consonant" tag="bv" ref="0 100 101" />
    <char cp="0E3A" when="follows-consonant" tag="bd" ref="0 100 101" comment="pinthu" />
    <char cp="0E40" when="precedes-consonant" tag="lv" ref="0 100 101" />
    <char cp="0E41" when="precedes-consonant" tag="lv" ref="0 100 101" />
    <char cp="0E42" when="precedes-consonant" tag="lv" ref="0 100 101" />
    <char cp="0E43" when="precedes-consonant" tag="lv" ref="0 100 101" />
    <char cp="0E44" when="precedes-consonant" tag="lv" ref="0 100 101" />
    <char cp="0E47" when="follows-consonant" tag="ad" ref="0 100 101" comment="maitaikhu" />
    <char cp="0E48" when="follows-consonant-av-bv" tag="tone" ref="0 100 101" />
    <char cp="0E49" when="follows-consonant-av-bv" tag="tone" ref="0 100 101" />
    <char cp="0E4A" when="follows-consonant-av-bv" tag="tone" ref="0 100 101" />
    <char cp="0E4B" when="follows-consonant-av-bv" tag="tone" ref="0 100 101" />
    <char cp="0E4C" when="follows-consonant-av-bv" tag="ad" ref="0 100 101" comment="thanthakhat" />
    <char cp="0E4D" when="follows-consonant-av-bv" tag="ad" ref="0 100 101" comment="nikhahit" />
    <char cp="0E4D 0E32" when="follows-consonant-tone" ref="0 100 101" comment="SARA AM sequence" />
  </data>
  <!--Rules section goes here-->
  <rules>
    <!--Character class definitions go here-->
    <class name="above-vowel" from-tag="av" comment="Any above vowel" />
    <class name="below-vowel" from-tag="bv" comment="Any below vowel" />
    <class name="consonant" from-tag="cons" comment="Any Consonant" />
    <class name="sara-aa" from-tag="sara-aa" comment="SARA AA" />
    <class name="tone" from-tag="tone" comment="Any tone mark" />
    <union name="c-av-bv" comment="Any consonant, vowel-above or vowel-below">
      <class by-ref="consonant" />
      <class by-ref="above-vowel" />
      <class by-ref="below-vowel" />
    </union>
    <union name="ct" comment="Any consonant or tone">
      <class by-ref="consonant" />
      <class by-ref="tone" />
    </union>
    <union name="ctaa" comment="Any consonant, tone or sara-aa">
      <class by-ref="consonant" />
      <class by-ref="tone" />
      <class by-ref="sara-aa" />
    </union>
    <!--Whole label evaluation and context rules go here-->
    <rule name="leading-combining-mark" comment="default WLE rule matching labels with leading combining marks">
      <start />
      <union>
        <class property="gc:Mn" />
        <class property="gc:Mc" />
      </union>
    </rule>
    <rule name="precedes-consonant" comment="WLE 7.2: check if current cp is preceding a consonant">
      <anchor />
      <look-ahead>
        <class by-ref="consonant" />
      </look-ahead>
    </rule>
    <rule name="follows-consonant" comment="WLE 7.3: check if current cp is following a consonant">
      <look-behind>
        <class by-ref="consonant" />
      </look-behind>
      <anchor />
    </rule>
    <rule name="between-consonant-and-ct" comment="WLE 7.4: check if current cp is in between a consonant and either tone or consonant">
      <look-behind>
        <class by-ref="consonant" />
      </look-behind>
      <anchor />
      <look-ahead>
        <class by-ref="ct" />
      </look-ahead>
    </rule>
    <rule name="follow-consonant-tone-sara-aa" comment="WLE 7.5: U+0E30 (THAI CHARACTER SARA A, ะ) can follow a consonant, a tone or U+0E32 (THAI CHARACTER SARA AA, า)">
      <look-behind>
        <class by-ref="ctaa" />
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-consonant-tone" comment="WLE 7.6, 7.9: check if current cp is following a consonant or a tone">
      <look-behind>
        <class by-ref="ct" />
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-consonant-av-bv" comment="WLE 7.7, 7.8 : A tone-mark, THANTHAKHAT, NIKAHIT can only follow a consonant, above-vowel or below-vowel">
      <look-behind>
        <class by-ref="c-av-bv" />
      </look-behind>
      <anchor />
    </rule>
    <!--Action elements go here - order defines precedence-->
    <action disp="invalid" match="leading-combining-mark" />
    <action disp="invalid" any-variant="out-of-repertoire-var" comment="any variant label with a code point out of repertoire is invalid" />
    <action disp="blocked" any-variant="blocked" />
    <action disp="allocatable" any-variant="allocatable" />
    <action disp="valid" comment="catch all" />
  </rules>
</lgr>