This file contains a set of Label Generation Rules for the Korean script for the Root Zone. "Korean script" usually means "Hangul". However, in the context of the Korean LGR, "Korean script" refers to the union of Hangul and Hanja.
For more details on this LGR and its development, as well as background on the script, see "Proposal for a Korean Script Root Zone Label Generation Ruleset (LGR)" [Proposal-Korean], including the appendices [Proposal-Korean-Appendices]
The format of this file follows [RFC 7940].
The repertoire includes 11172 Hangul syllables and 4758 Hanja characters for a total of 15930 code points (11172 + 4758). For more details, see Section 5, "Repertoire" in [Proposal-Korean].
The repertoire is based on [MSR-4], which is a subset of [Unicode 6.3].
Code points outside the Korean script repertoire that are listed in this file are targets for out-of-repertoire variants and are identified by a reflexive (identity) variant of type "out-of-repertoire-var". They do not form part of the repertoire.
Each code point is tagged with the script or scripts that the code point is used with, and one or more references documenting sufficient justification for inclusion in the repertoire; see "References" below.
For more details, see Section 5 "Repertoire" in [Proposal-Korean].
There are no variants defined between Hangul syllables.
This LGR contains 289 variant sets, which can be classified as follows:
1. 283 variant sets consist of variants between two or more Hanja characters that are part of this LGR repertoire. One of these variant sets also includes a Hangul syllable.
2. 4 variant sets consist of variants between one or more in-repertoire Hanja character and a Hangul syllable. This includes one variant set already included in the previous group.
3. 3 variant sets consist of variants between one out-of-repertoire Hanja character and a Hangul syllable. One variant set also includes one in-repertoire Hanja character.
Because one variant set is classified in both group 1 and 2, the total of variant sets is 283+4+3-1 = 289.
All non-reflexive variant mappings defined are of type “blocked” and all reflexive ones of type “out-of-repertoire-var”. For more details, see Section 6 "Variants" in [Proposal-Korean].
In addition to the variants listed in this LGR, integration into the Root Zone LGR will result in many additional variants to out-of-repertoire code points as result of variants defined in other LGRs, including the effects of transitivity. However, the list of in-repertoire variants in this LGR is exhaustive.
This LGR defines the following named character classes:
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟.
A label may consist of Hangul syllables or Hanja characters only; no mixes are allowed. See also Section 7, "Whole Label Evaluation Rules (WLE)" in [Proposal-Korean].
The Root Zone LGR for the Korean script was developed by the Korean Script Generation Panel (KGP). For further details on methodology and contributors, see Sections 4 and 8 in [Proposal-Korean].
References [0] to [3] refer to the Unicode Standard versions in which corresponding code points were initially encoded. For example, a reference value of [3] means a character was first encoded in Unicode Version 3.0.
References [101] and up indicate that the code point is included in one of the following source references: [101] 'kIRG_KSource' values starting by K0 in the Unihan database and [102] 'kIICore' values including the letter 'K'(indicating source) in the Unihan database.
Entries in the table may have multiple source reference values.
In addition, the following general references are cited in this document:
For additional detail on references cited in the remainder of the document refer to the Table of References below.
Where links to certain references are not available, the data may be archived as part of [Proposal-Korean].
This XML file was prepared on 2020-09-01 based on K-LGR v2.1 (2020-09-01, klgp220_210h)
]]>