This file contains a set of reference Label Generation Rules (LGR) for the Chinese language for the second level. The starting point for the development of this LGR can be found in the related Root Zone LGR [RZ-LGR-1-Hani]. For details and additional background on the script, see "Proposal for a Chinese Script Root Zone LGR" Proposal-Han] including the appendices [Proposal-Chinese-Appendices]. Note that while it is the Chinese Script Root Zone LGR which forms the starting point the reference LGR defined here covers the Chinese Language. The format of this file follows [RFC 7940].
The Root Zone LGR for the Chinese script lists 19685 single code points. It is the union of CDNC IDN Table 2.0/2018 and dotAsia IDN Table. That repertoire is a subset of [Unicode 6.3]. For mor detail, see Section 5, "Repertoire" in [Proposal-Han]. (The proposal cited has been adopted for the Han script portion of the Root Zone LGR.)
For the second level, the repertoire is augmented by the ASCII digits and ASCII lowercase letters as well as U+002D HYPHEN, for a total of 19722 repertoire elements. Unlike many other non-Latin second level reference LGRs, the Chinese LGR includes the basic ASCII Latin set (a to z) because it is common practice in Chinese text to mix Han and ASCII. Therefore it does not create confusability or additional security risks in the context of a second level LGR for the Chinese language. It is also supported by current IDNA practice, see [200], for example.
Note: This LGR contains 62 code points from the Unicode block: CJK UNIFIED IDEOGRAPHS EXTENDED B.
Code points outside the Chinese language that are listed in this file are targets for out-of-repertoire variants and are identified by a reflexive (identity) variant of type "out-of-repertoire-var". They do not form part of the repertoire.
Each code point is tagged with the script or scripts that the code point is used with, one or more tag values denoting character category, and one or more references documenting sufficient justification for inclusion in the repertoire, see "References" below.
According to Sections 6 and 7 of [Proposal-Han] the LGR defines semantically exchangeable variants (same pronunciations as well as the same meanings) and visually identical variants. Many, but not all of the semantic variants lead to variant label dispositions of "allocatable" as described below, all other variant labels are "blocked".
The source reference cited for each variant mapping entry indicates whether the variant mapping is same or different from the rules of CDNC, dotAsia, JGP and KGP, or is based on input during the Root Zone LGR integration process.
For example, a value of "201"means the variant mapping matches dotAsia in both code point and variant type", a value of "202" means the variant mapping matches dotAsia only in code point but not variant type, while a value of "203" means the variant mapping matches no dotAsia code point".
A value of "701" means the variant mapping agrees with input received during the Root Zone integration process in both code point and variant type".
The LGR uses the following variant types (the prefix "r-" marks a type used in a reflexive variant mapping, that is, it represents an instance of the original code point at that location, see Section 5.3.4 in [RFC 7940]):
The variant types ("simp-1", "simp-2", "trad-1", "trad-2") and the corresponding actions reduce the number of multiple allocatable labels, limiting the number to no more than 5. This case can occur in a small number of variant sets that have multiple traditional or simplified variants.
Note that a label containing only reflexive mappings, including "r-neither", is an original label and receives a disposition of "valid". (See also "Chinese-specific Actions" below.)
The LGR does not define specific variant types to handle visually identical variants, but adopts "both" and "blocked" instead, as described in Section 7 of [Proposal-Han].
As much as possible the scheme retains the same simplified and traditional mappings as existing second level domains. It does not change the simplified type or traditional type of any variant code point; instead, it subdivides them into common simplified/traditional ones and extra simplified/traditional ones, and provides additional disposition rules to limit any allocatable variant to one of these subgroups. While it does not allow the applicant to get arbitrary mixed labels from an unconstrained allocatable label list, it does allow the applicant to select as the original label one specific desired mixed variant.
The comment for each variant mapping entry indicates not only whether it is the reflexive identity and therefore applies to those code points in a label that match the original label, but also documents whether it has been conflated with some other type originally introduced to reduce the number of multiple allocatable labels in Section 6.3 of [Proposal-Han].
For example, a variant comment of "r-both-ms" indicates that for a given code point, the variant is a code point that could in principle be used with both trad and simp ("r-both"), but because the same variant set has at least one other variant of type "simp" (or of one of the other simplified types), the variant is preferred in a traditional context, such that a "trad" label containing this variant should be "allocatable", but a "simp" label containing this variant should be "blocked". The "ms" in the naming convention means that the simplified aspect of the variant is "muted", even though inherently if would have been of type "r-both". Effectively, such a variant leads to the same dispositions as an "r-trad", hence that is the variant type assigned to it, with only the comment indicating that the variant would have inherently been of type "r-both" but with its simp aspect muted.
None.
The LGR defines a number of actions that compute a label disposition based on WLE rules or variant mapping types. Some of these are common to all LGRs, and some are specific to this LGR.
Actions include the default actions for LGRs as well as that needed to invalidate labels with misplaced combining marks. They are marked with ⍟. For a description see [RFC 7940].
The LGR contains additional Chinese-specific actions as described in Sections 6 and 7 of [Proposal-Han]. These resolve the extended set of variant types into a disposition for variant labels of either "allocatable" or "blocked". Chinese-specific actions that are triggered by the LGR-specific variant types described above limit the "allocatable" variant labels to those that are either fully simplified or fully traditional labels. In addition, these actions return a disposition of "valid" for any original label, even those that are mixed between simplified and traditional (see also [RFC 3743] and [RFC 4713]). To account for original labels, reflexive variant mappings with an "r-" prefix are used. (See [RFC 7940])
Note: there is no action explicitly triggered by variant type "r-neither". Instead, it is implicitly handled by the "catch-all" action. Its main benefit is in explicitly documenting the status of the code point.
Note that variant mapping types are not symmetric: they depend on which code point is considered the source or the target in a given mapping. As specified in [RFC 7940], mapping types are evaluated for each permutation of a label and its variants, with code points that are unchanged in a given label given the type of their "reflexive" mapping. The actions finally evaluate the collected set of mapping types and resolve them into one of two dispositions for the variant label.
For more information on how to assign a variant label disposition under this LGR, see Section 8, "Assigning Variant Dispositions to Labels" in [Proposal-Han]. The specification of variants in this reference LGR follows the guidelines in [RFC 8228].
This reference LGR for the Chinese language for the 2nd Level has been developed by Michel Suignard and Asmus Freytag, based on the Root Zone LGR for Chinese script and information contained or referenced therein, see [RZ-LGR-1-Hani]. Suitable extensions for the second level have been applied according to the [Guidelines]. The original proposal for a Root Zone LGR for Chinese, that this LGR is based on, was developed by the Chinese Generation Panel (CGP). For methodology and contributors to the underlying Root Zone LGR, see Sections 4 and 9 in [Proposal-Han], as well as [RZ-LGR-Overview].
References [0] to [4] refer to the Unicode Standard versions in which corresponding code points were initially encoded. References [100], [200], [300], [400], [500], and [600] correspond to sources given in [Proposal-Han] for justifying the inclusion of for the corresponding code points. References [101], [102], [103], [201], [202], [203], [301], [501], [701], and [801] correspond to variant mapping types given in [Proposal-Han] for justifying these mapping types. Entries in the table may have multiple source reference values. Reference [150] indicates the source for common rules.
In addition, the following general references are cited in this document:
For additional detail on references cited in this document refer to the Table of References below. Where stable links to certain references are not publicly available, the data may be archived in [Proposal-Chinese-Appendices].
]]>