﻿<?xml version="1.0" encoding="utf-8"?>
<lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
  <meta>
    <version comment="Proposed LGR for Khmer">2</version>
    <date>2016-08-15</date>
    <language>und-Khmr</language>
    <scope type="domain">.</scope>
    <unicode-version>6.3.0</unicode-version>
    <description type="text/html">
    <![CDATA[
<h1>Label Generation Rules for Khmer</h1>
<h2>Overview</h2>
<p>For more details on this proposal see "Proposal for a Khmer Script Root Zone LGR [Proposal]" </p>
<h2>Repertoire</h2>
<p>According to Section 5 "Repertoire", in "[Proposal]".</p> 
<h2>Variants</h2>
<p>According to Section 6 "Variants", in "[Proposal]", two Khmer consonants are variants of each other in their subscript form only. This is captured by a variant relation between the two subscript sequences, which are listed explicitly in the repertoire.</p>
<h2>Character Classes</h2>
<p>Some consonants have been given the tag "base only"; these do not take subscript form. </p>
<p>The character U+17C9 (៉) KHMER SIGN MUUSIKATOAN is used with a subset of <i>first series</i> consonants, code points from this subset have been given the tag "series-one". </p>
<p>The character U+17CA (៊) KHMER SIGN TRIISAP is used with subset of <i>second series</i> consonants and with U+1794 (ប) KHMER LETTER BA. Code points from this subset have been given the tag "series-two". The series, and the sets of code points used with these two signs are defined in [210]. The character U+17CB (់) KHMER SIGN BANTOC is used with a subset of consonants, that subset has been given the tag "series-three" [205]. The characters U+17C9 (៉) KHMER SIGN MUUSIKATOAN and U+17CA (៊) KHMER SIGN TRIISAP are collectively known as consonant shifters and have been given the tag "shifter". </p>
<p>The character U+17CC (៌) KHMER SIGN ROBAT has been given the tag "robat". </p>
<p>The character U+17C6 (ំ) KHMER SIGN NIKAHIT is used with consonants and some dependent vowels. These dependent vowels have been given a tag "dependent-vowel-1" (see Section 5.3 in [Proposal]). The character U+17C7 (ះ) KHMER SIGN REAHMUK is used with consonants and some dependent vowels. These dependent vowels have been given a tag "dependent-vowel-2" (see Section 5.3 in [Proposal]). The code point U+17B7 (ិ) KHMER VOWEL SIGN I has been given a tag "dependent-vowel-3".</p>
<h2>Whole Label Evaluation (WLE) and Context Rules</h2>
            <h3>Default Whole Label Evaluation Rules</h3> 
            <p>The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-2]. They are marked with &#x235F;. The default prohibition on leading combining marks is equivalent to ensuring that a label only starts with a consonant or independent-vowel.</p> 
      <h3>Khmer-specific Rules</h3>
<p>Rules provided in the LGR as described in Section 7 of [Proposal] reasonably restrict labels so that they conform to Khmer syllable structure. 
Where possible these constraints are presented as context rules. </p>
<p>The rules are: </p>
<ul>
  
  <li><b>Subscript-consonant</b>  A rule that specifies allowable consonant sequence. See Section 7.2 in [Proposal]</li>
  
  <li><b>Subscript-consonant-limit</b>  It limits the occurrence of subscript-consonant to two. In case of three, label is stated as invalid.  See Section 7.3 in [Proposal]</li>
  
  <li><b>Coeng-context</b>  A context rule for 17D2 that must have a consonant or base-only before it and a consonant after it. See Section 7.4 in [Proposal]</li>
  
  <li><b>Follows-consonant-robat-shifter</b>   A context rule for those code points that must follow a consonant, robat or shifter. See Section 7.5 in [Proposal]</li>
  
  <li><b>Follows-series-two</b>   A context rule for those code points that must follow series-two consonants. See Section 7.6 in [Proposal]</li>
  
  <li><b>Follows-series-one</b>  A context rule for those code points that must follow series-one consonants. See Section 7.7 in [Proposal]</li>
  
  <li><b>Follows-consonant</b>  A context rule for those code points that must always follow a consonant. See Section 7.8 in [Proposal]</li>
  
  <li><b>Follows-consonant-shifter</b> A context rule for those code points that must always follow a consonant or shifter. See Section 7.9 in [Proposal]</li>
  
  <li><b>Follows-consonant-depvowel-1-shifter</b>  A context rule for those code points that must follow consonant, shifter or dependent-vowel-1. See Section 7.10 in [Proposal]</li>
  
  <li><b>Follows-consonant-depvowel-2-shifter</b>  A context rule for those code points that must follow consonant, shifter or dependent-vowel-2. See Section 7.11 in [Proposal]</li>
  
  <li><b>Follows-series-three</b>  A context rule for those code points that must follow series-three consonants. See Section 7.12 in [Proposal]</li>
  
  <li><b>Follows-consonant-or-vowel-i</b>  A context rule for those code points that must follow a consonant or 17B7 KHMER VOWEL SIGN I. See Section 7.13 in [Proposal]</li>
  
</ul>
      <h2>References</h2> 
       <p>Reference value ("ref" attribute) 3 refers to Unicode Standard versions
              in which corresponding code points were initially encoded. Reference values 203, 204, 205, 206, 207, 208, 209 & 210 correspond to sources justifying the inclusion of or classification for the corresponding 
              code points. Single code point or ranges may have
 multiple source reference values.</p>
  
       <p>Reference values ("ref" attribute") from 101 and up refer to specific sources cited for the
 corresponding code points in the "[Proposal]".</p>

      <p>In addition the following references are cited in this document:</p>
      <dl class="references">
        <dt>[MSR-2]</dt>
        <dd>Integration Panel, "Maximal Starting Repertoire — MSR-2 Overview and Rationale", 14 April 2015
         https://www.icann.org/en/system/files/files/msr-2-overview-14apr15-en.pdf</dd>
        <dt>[Proposal]</dt>
        <dd><i>Proposal for Khmer Script Root Zone LGR,  https://www.icann.org/resources/pages/lgr-proposals-2015-12-01-en</i></dd> 
        <dt>[Unicode6.3]</dt>
        <dd>The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5) 
        http://www.unicode.org/versions/Unicode6.3.0/</dd>
     </dl>
     <p>For more details for references [100] and up and [3] and up refer to the <a href="#table_of_references">Table of References</a> below.</p> 


]]></description>
    <references>
      <reference id="3" comment="Any code point cited was originally encoded in Unicode Version 3.0">
        The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)</reference>
      <reference id="100" comment="Subsection, Subscript Consonant Signs, pages 616-618">
        The Unicode Consortium. The Unicode Standard, Version 8.0.0, (Mountain View, CA: The Unicode Consortium, 2015. ISBN 978-1-936213-10-8), Chapter 16: Southeast Asia, section 16.4: Khmer, http://www.unicode.org/versions/Unicode8.0.0/ch16.pdf</reference>
      <reference id="203" comment="Any code point cited is for consonant characters">
        PRIMARY SCHOOL GRADE 1, MOEYS, ISBN 9-789-995-001-674, Publication 2015, Figure 1</reference>
      <reference id="204" comment="Any code point cited is for vowel signs">
        PRIMARY SCHOOL GRADE 1, MOEYS, ISBN 9-789-995-001-674, Publication 2015, Figure 2</reference>
      <reference id="205" comment="BANTOC sign and its context">
      Dr. Prum Mol (Khmer Linguist), Grammar of Modern Khmer Language, Linguist of National Institute of Language, Royal Academy of Cambodia, 2006, page 37 Bantoc sign section starts towards the bottom of the page, page 38 list the context of the Bantoc sign</reference>
      <reference id="206" comment="Any code point cited is for independent vowel characters">
        PRIMARY SCHOOL GRADE 1, MOEYS, ISBN 9-789-995-001-674, Publication 2015, Figure 6</reference>
      <reference id="207" comment="Any code point cited is for diacritics">
        PRIMARY SCHOOL GRADE 1, MOEYS, ISBN 9-789-995-001-674, Publication 2015, Figure 7</reference>
      <reference id="208" comment="Any code point cited is for diacritics">
        PRIMARY SCHOOL GRADE 1, MOEYS, ISBN 9-789-995-001-674, Publication 2015, Figure 8</reference>
        <reference id="209" comment="Any code point cited is for diacritics">
        PRIMARY SCHOOL GRADE 1, MOEYS, ISBN 9-789-995-001-674, Publication 2015, Figure 9</reference>
      <reference id="210" comment="Sets of codepoints used with certain signs">
       Franklin E. Huffman, Cambodian System of Writing and Beginning Reader, Yale University, 1970, reprinted 1987</reference>
    </references>
  </meta>
  <data>
    <char cp="1780" tag="consonant series-three" ref="3 203 205" />
    <char cp="1781" tag="consonant" ref="3 203" />
    <char cp="1782" tag="consonant" ref="3 203" />
    <char cp="1783" tag="consonant" ref="3 203" />
    <char cp="1784" tag="consonant series-three series-two" ref="3 203 205 210" />
    <char cp="1785" tag="consonant series-three" ref="3 203 205" />
    <char cp="1786" tag="consonant" ref="3 203" />
    <char cp="1787" tag="consonant" ref="3 203" />
    <char cp="1788" tag="consonant" ref="3 203" />
    <char cp="1789" tag="consonant series-three series-two" ref="3 203 205 210" />
    <char cp="178A" tag="consonant" ref="3 203" />
    <char cp="178B" tag="consonant" ref="3 203" />
    <char cp="178C" tag="consonant" ref="3 203" />
    <char cp="178D" tag="consonant" ref="3 203" />
    <char cp="178E" tag="consonant" ref="3 203" />
    <char cp="178F" tag="consonant series-three" ref="3 203 205" />
    <char cp="1790" tag="consonant" ref="3 203" />
    <char cp="1791" tag="consonant" ref="3 203" />
    <char cp="1792" tag="consonant" ref="3 203" />
    <char cp="1793" tag="consonant series-three" ref="3 203 205" />
    <char cp="1794" tag="consonant series-one series-three series-two" ref="3 203 205 210" />
    <char cp="1795" tag="consonant" ref="3 203" />
    <char cp="1796" tag="consonant" ref="3 203" />
    <char cp="1797" tag="consonant" ref="3 203" />
    <char cp="1798" tag="consonant series-two" ref="3 203 210" />
    <char cp="1799" tag="consonant series-two" ref="3 203 210" />
    <char cp="179A" tag="consonant series-two" ref="3 203 210" />
    <char cp="179B" tag="consonant series-three" ref="3 203 205" />
    <char cp="179C" tag="consonant series-two" ref="3 203 210" />
    <char cp="179F" tag="consonant series-one series-three" ref="3 203 205 210" />
    <char cp="17A0" tag="consonant series-one" ref="3 203 210" />
    <char cp="17A1" tag="base-only consonant" ref="3 203" />
    <char cp="17A2" tag="consonant series-one" ref="3 203 210" />
    <char cp="17A5" tag="independent-vowel" ref="3 206" />
    <char cp="17A6" tag="independent-vowel" ref="3 206" />
    <char cp="17A7" tag="independent-vowel" ref="3 206" />
    <char cp="17AA" tag="independent-vowel" ref="3 206" />
    <char cp="17AB" tag="independent-vowel" ref="3 206" />
    <char cp="17AC" tag="independent-vowel" ref="3 206" />
    <char cp="17AD" tag="independent-vowel" ref="3 206" />
    <char cp="17AE" tag="independent-vowel" ref="3 206" />
    <char cp="17AF" tag="independent-vowel" ref="3 206" />
    <char cp="17B0" tag="independent-vowel" ref="3 206" />
    <char cp="17B1" tag="independent-vowel" ref="3 206" />
    <char cp="17B3" tag="independent-vowel" ref="3 206" />
    <char cp="17B6" when="follows-consonant-robat-shifter" tag="dependent-vowel dependent-vowel-1" ref="3 204" />
    <char cp="17B7" when="follows-consonant-robat-shifter" tag="dependent-vowel dependent-vowel-2 dependent-vowel-3" ref="3 204" />
    <char cp="17B8" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17B9" when="follows-consonant-robat-shifter" tag="dependent-vowel dependent-vowel-2" ref="3 204" />
    <char cp="17BA" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17BB" when="follows-consonant-robat-shifter" tag="dependent-vowel dependent-vowel-1 dependent-vowel-2" ref="3 204" />
    <char cp="17BC" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17BD" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17BE" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17BF" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17C0" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17C1" when="follows-consonant-robat-shifter" tag="dependent-vowel dependent-vowel-2" ref="3 204" />
    <char cp="17C2" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17C3" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17C4" when="follows-consonant-robat-shifter" tag="dependent-vowel dependent-vowel-2" ref="3 204" />
    <char cp="17C5" when="follows-consonant-robat-shifter" tag="dependent-vowel" ref="3 204" />
    <char cp="17C6" when="follows-consonant-depvowel-1-shifter" tag="sign" ref="3 204" />
    <char cp="17C7" when="follows-consonant-depvowel-2-shifter" tag="sign" ref="3 208" />
    <char cp="17C8" when="follows-consonant" tag="sign" ref="3 207 208 209" />
    <char cp="17C9" when="follows-series-two" tag="shifter" ref="3 207 208 209 210" />
    <char cp="17CA" when="follows-series-one" tag="shifter" ref="3 207 208 209 210" />
    <char cp="17CB" when="follows-series-three" tag="sign" ref="3 205 207 208 209" />
    <char cp="17CC" when="follows-consonant" tag="robat" ref="3 207 208 209" />
    <char cp="17CD" when="follows-consonant-or-vowel-i" tag="sign" ref="3 207 208 209" />
    <char cp="17D0" when="follows-consonant-shifter" tag="sign" ref="3 207 208 209" />

	<char cp="17D2" when="coeng-context" tag="coeng" ref="3 100" />
    
    <char cp="17D2 178A" when="follows-consonant" ref="3" comment="KHMER CONSONANT SIGN COENG DA">
      <var cp="17D2 178F" type="blocked" comment="subscript forms are homoglyphs" />
    </char>
    <char cp="17D2 178F" when="follows-consonant" ref="3" comment="KHMER CONSONANT SIGN COENG TA">
      <var cp="17D2 178A" type="blocked" comment="subscript forms are homoglyphs" />
    </char>
    
    
  </data>
  <!--Rules section goes here-->
  <rules>
    <!--Character class definitions go here-->
    <class name="consonant" from-tag="consonant" comment="Any consonant" />
    <difference name="consonant-but-not-base-only" comment="Any consonant that is not base-only">
      <class by-ref="consonant" />
      <class from-tag="base-only" />
    </difference>
    <class name="dependent-vowel-1" from-tag="dependent-vowel-1" />
    <class name="dependent-vowel-2" from-tag="dependent-vowel-2" />
     <class name="dependent-vowel-3" from-tag="dependent-vowel-3" />
    <class name="robat" from-tag="robat" />
    <class name="series-one" from-tag="series-one" />
    <class name="series-three" from-tag="series-three" />
    <class name="series-two" from-tag="series-two" />
    <class name="shifter" from-tag="shifter" />
    <!--Whole label evaluation and context rules go here-->
    <rule name="leading-combining-mark" comment="WLE Rule No.1: default WLE rule matching labels with leading combining marks ⍟">
      <start />
      <union>
        <class property="gc:Mn" />
        <class property="gc:Mc" />
      </union>
    </rule>
    <rule name="subscript-consonant" comment="WLE Rule No. 2: allowable subscript consonant sequence">
      <char cp="17D2" />
      <class by-ref="consonant-but-not-base-only" />
    </rule>
    <rule name="subscript-consonant-limit" comment="WLE Rule No. 3: more than two subscript consonants">
      <rule by-ref="subscript-consonant" count="3" />
    </rule>
    <rule name="coeng-context" comment="WLE Rule No. 4: checks for 17D2 and its surrounding code points">
      <look-behind>
        <class by-ref="consonant" />
      </look-behind>
      <anchor />
      <look-ahead>
        <class by-ref="consonant-but-not-base-only" />
      </look-ahead>
    </rule>
    <rule name="follows-consonant-robat-shifter" comment="WLE Rule No. 5: makes sure that dependent vowel follows a consonant or a shifter or a robat">
      <look-behind>
        <choice>
          <class by-ref="shifter" />
          <class by-ref="consonant" />
          <class by-ref="robat" />
        </choice>
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-series-two" comment="WLE Rule No. 6: checks sequence for shifter 17C9 MUUSIKATOAN">
      <look-behind>
        <class by-ref="series-two" />
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-series-one" comment="WLE Rule No. 7: checks sequence for shifter 17CA TRIISAP">
      <look-behind>
        <class by-ref="series-one" />
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-consonant" comment="WLE Rule No. 8: checks if sign code point or subscript consonant follows a consonant">
      <look-behind>
        <class by-ref="consonant" />
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-consonant-shifter" comment="WLE Rule No. 9: checks if 17D0 KHMER SIGN SAMYOKSANNYA follows a consonant or a shifter">
      <look-behind>
      <choice>
        <class by-ref="consonant" />
        <class by-ref="shifter" />
      </choice>
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-consonant-depvowel-1-shifter" comment="WLE Rule No. 10: checks if 17C6 KHMER SIGN NIKAHIT follows a consonant or a dependent vowel-1 or a shifter">
      <look-behind>
        <choice>
          <class by-ref="consonant" />
          <class by-ref="dependent-vowel-1" />
          <class by-ref="shifter" />
        </choice>
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-consonant-depvowel-2-shifter" comment="WLE Rule No. 11: checks if 17C7 KHMER SIGN REAHMUK follows a consonant or a dependent vowel-2 or a shifter">
      <look-behind>
        <choice>
          <class by-ref="consonant" />
          <class by-ref="dependent-vowel-2" />
          <class by-ref="shifter" />
        </choice>
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-series-three" comment="WLE Rule No. 12: checks if 17CB KHMER SIGN BANTOC code point follows a series-three consonant">
      <look-behind>
	  	<class by-ref="series-three" />
      </look-behind>
      <anchor />
    </rule>
    <rule name="follows-consonant-or-vowel-i" comment="WLE Rule No. 13: checks if 17CD KHMER SIGN TOANDAKHIAT follows a consonant or 17B7 KHMER VOWEL SIGN I (Dependent-vowel-3)">
      <look-behind>
      	<class by-ref="consonant" />
        <class by-ref="dependent-vowel-3" count="0:1" />
      </look-behind>
      <anchor />
    </rule>
    <!--Action elements go here - order defines precedence-->
    <action disp="invalid" match="leading-combining-mark" comment="labels with leading combining marks are invalid ⍟" />
    <action disp="invalid" any-variant="out-of-repertoire-var" comment="any variant label with a code point out of repertoire is invalid" />
    <action disp="invalid" match="subscript-consonant-limit" comment="any label with more than two subscript consonants in a row is invalid" />
    <action disp="blocked" any-variant="blocked" />
    <action disp="allocatable" any-variant="allocatable" />
    <action disp="valid" comment="catch all" />
  </rules>
</lgr>
