This document, together with the set of element LGRs, specifies an integrated collection of Label Generation Rules for the Root Zone. For more details on the Root Zone LGRs and their development see "Root Zone Label Generation Rules - LGR-4: Overview and Summary" [RZ-LGR-4-Overview]. This file is one of a set of LGR files that together form an integrated LGR for the DNS Root Zone [RZ-LGR-4]. The format of this file follows [RFC 7940].
This is a DRAFT document released for public comments and not final. Please see the announcement on the ICANN website for public comments on Root Zone 4 for details on how to submit comments.
The Label Generation Rules for the Root Zone (LGR-4) are integrated from the following set of script-specific element LGRs:
Each element LGR provides the complete specification for determining both the validity of a label in the given script, as well as the set of allocatable (or blocked) variant labels. See Section 5.2, "Steps in Processing a Label" in [RZ-LGR-4-Overview].
Each LGR represents in full the underlying, proposal for the script-based LGR, except for changes required by the integration process or for uniformity of presentation. See Section 3, "Integration and Contents of LGR-4" in [RZ-LGR-4-Overview].
This merged LGR has been machine generated by combining the data from all element LGRs plus a common preamble containing this description and the list of references used in this merged LGR. The merged LGR contains the union of the repertoire, variant mappings and Whole Label Evaluation (WLE) rules of the element LGRs as described in the following sections. Data that are necessarily script-dependent, such as the type for variant mappings have been removed or replaced by default values.
When processing an applied for label, this merged LGR presents the complete data and specification needed for conflict checking with any existing label, independent of script. This is in contrast to the script-specific element LGRs, each of which presents the complete data and specification to determine the validity and full set of allocatable variants for the label, when applied for under that script. See also Section 5, "Using the LGR" in [RZ-LGR-4-Overview].
The repertoire of the integrated Root Zone LGR is the cumulative repertoire of all the element LGRs that have been integrated into this version. Those repertoires, in turn were developed based on [MSR-4], which is a subset of [Unicode 6.3].
As a Root Zone LGR, the repertoire includes neither digits nor the HYPHEN-MINUS.
For further details, see Section 3.2.1, "Repertoire" in [RZ-LGR-4-Overview].
Each code point or range is tagged with the script or scripts that the code point is used with, and a reference to the Unicode Standard in which the code point was first encoded; see "References" below. Some code points are also tagged with script-specific classifications. These tags have been prefixed with the Unicode script identifier in the merged LGR.
The variant mappings in this LGR are the union of the non-reflexive variant mappings from all the element LGRs that have been integrated into this version of the Root Zone LGR. Because the disposition of variant labels, for example as "allocatable", is specific to each script, information related to that cannot be expressed in the script-neutral context of this merged file. Instead, all merged variant mappings are labeled as "blocked" in this document as needed for conflict checking. See also Section 3.2.2, "Variants" in [RZ-LGR-4-Overview].
As the repertoires are merged, any code point with a reflexive "out-of-repertoire-var" mapping in all element LGRs containing that code point will be considered not part of the merged repertoire and removed. If, however, any element LGR contains that code point as part of its repertoire, only the reflexive "out-of-repertoire-var" mapping will be removed. In contrast, both the code points and mappings are always retained in the Element LGRs, independent of whether they are part of the merged repertoire.
Context Rules for Variants: some of the variants defined in this LGR are "effective null variants", that is, some code points in the source map to "nothing" in the target with all other code points unchanged. (Because mappings are symmetric, it does not matter whether it is the forward or reverse mapping that maps to "null"). Such variants require a context rule to keep the variant set well behaved. Symmetry requires the same context rule for both forward and reverse mappings.
In other cases, the sequences or code points making up source and target are constrained by explicit context rules on the code points (or by implicit context rules defined for the adjacent code points). In such a case, any variants may require context rules that match the intersection between the effective contexts for both source and target; otherwise, a sequence might be considered valid in some variant label when it would not be valid in an equivalent context in an original label. See Section 6.4.2 in [RZ-LGR-4-Overview].
Some sequences may overlap, that is, they share a common part with another sequence or code point, so that, in partitioning a label into code points and sequences, more than one partition is possible. In these cases, variants have to be computed for all possible partitions. In some cases, context rules on sequences or variants are defined to curtail any unwanted side effects of such multiple partitioning, such as having each partition being part of a different variant label set, or generating a different index variant. For further discussion, see Section 6 in [RZ-LGR-4-Overview].
The specification of variants in the Root Zone LGR follows the guidelines in [RFC 8228].
This merged LGR includes the cumulative set of character classes from all the element LGRs that have been integrated into this version of the Root Zone LGR. See Section 3.2.3, "Character Classes" in [RZ-LGR-4-Overview]. The names for any script-specific character classes have been prefixed with the Unicode script identifier.
This merged LGR includes the cumulative set of WLE and context rules and actions from all the element LGRs that have been integrated into this version of the Root Zone LGR. See Section 3.2.4, "Whole Label Evaluation Rules (WLE)" [RZ-LGR-4-Overview]. See also the comments given for each rule or action.
The integrated LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟. These default rules include the restrictions defined in [RFC 5891] on placement of combining marks.
The names for any script-specific rules have been prefixed with the Unicode script identifier.
Some actions are triggered by script-specific variant type values. While such actions are collected here, they are inoperative in the context of the merged LGR because in the merged LGR all variant type values have been mapped to "blocked".
The Root Zone Label Generation Rules - LGR-4 were integrated by the Integration Panel [IP] from a set of proposals for script-based root zone LGRs developed by community-based Generation Panels [GPs] in an open process with multiple public consultations defined in [Procedure] and [Guidelines]. For more information on the methodology and contributors, see [RZ-LGR-4-Overview], in particular Section 2, "Process of Integration" and Section 8, "Contributors".
According to the "Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels" [Procedure], the Integration Panel is tasked with reviewing each script LGR proposal for the Root Zone and delivering an integrated Root Zone LGR after accepting them. Its members consist of experts in the area of Unicode, Linguistics and Writing Systems, Domain Name System (DNS) and IDNA. The Integration Panel was constituted on September 6, 2013 with the following members:
In the listing of the repertoire, references starting at []101] refer to Unicode Standard versions in which the corresponding code points were initially encoded. References [119] and above correspond to the script-specific LGRs that include the repertoire item. Repertoire items may have more than one reference.
In addition the following references are cited in this document:
For more details for references [120] and up and [121] and up refer to the Table of References below, as well as to [RZ-LGR-4-Overview].
]]>