Label Generation Rules for the Root Zone — LGR-3
Note: this merged file is intended to be used for testing a label for conflicts with existing labels irrespective of their script.
However, it can neither be used to conclusively establish the validity of any label nor to generate the set of allocatable variant
labels (see Section 5.2, "Common LGR" in [LGR-3]).
Overview
This document, together with the set of element LGRs, specifies an integrated
set of Label Generation Rules for the Root Zone.
For more details on the Root Zone LGRs and their development see
"Root Zone Label Generation Rules - LGR-3: Overview and Summary [LGR-3].
The format of this file follows [RFC 7940].
This version of the document and all associated files are released for Public Comment. Please see the announcement on the ICANN website for public comments on the Root Zone LGR-3 for details on how to submit comments.
Element LGRs
The Label Generation Rules for the Root Zone (LGR-3) are integrated from the following set of
script-specific element LGRs:
- Root Zone Label Generation Rules for the Arabic Script (und-Arab)
[101]
- Root Zone Label Generation Rules for the Devanagari Script (und-Deva)
[102]
- Root Zone Label Generation Rules for the Ethiopic Script (und-Ethi)
[103]
- Root Zone Label Generation Rules for the Georgian Script (und-Geor)
[104]
- Root Zone Label Generation Rules for the Gujarati Script (und-Gujr)
[105]
- Root Zone Label Generation Rules for the Gurmukhi Script (und-Guru)
[106]
- Root Zone Label Generation Rules for the Hebrew Script (und-Hebr)
[107]
- Root Zone Label Generation Rules for the Kannada Script (und-Knda)
[108]
- Root Zone Label Generation Rules for the Khmer Script (und-Khmr)
[109]
- Root Zone Label Generation Rules for the Lao Script (und-Laoo)
[110]
- Root Zone Label Generation Rules for the Malayalam Script (und-Mlym)
[111]
- Root Zone Label Generation Rules for the Oriya Script (und-Orya)
[112]
- Root Zone Label Generation Rules for the Sinhala Script (und-Sinh)
[113]
- Root Zone Label Generation Rules for the Tamil Script (und-Taml)
[114]
- Root Zone Label Generation Rules for the Telugu Script (und-Telu)
[115]
- Root Zone Label Generation Rules for the Thai Script (und-Thai)
[116]
Each element LGR represents in full the underlying, proposal for the script-based LGR, except for
changes required by the integration process or for uniformity of presentation.
See Section 3, "Integration and Contents of LGR-3" in [LGR-3].
Merged LGR
This merged LGR contains the union of the repertoire, variant mappings and Whole
Label Evaluation (WLE) rules as described in the following sections. Data that are necessarily
script-dependent, such as the type for variant mappings have been removed or replaced by
default values.
When processing an applied for label, this merged LGR presents the complete data and
specification needed for conflict checking with any existing label, independent of script, while each
script-specific element LGR presents the complete data and specification to determine the validity
and full set of allocatable variants for the label, when applied for under that script. See also Section 5,
"Using the LGR" in [LGR-3].
Repertoire
The repertoire of the integrated Root Zone LGR is the cumulative repertoire of all the Element
LGRs that have been integrated into this version. Those repertoires, in turn were
developed based on [MSR-4], which is a subset of [Unicode 6.3].
As a Root Zone LGR, the repertoire includes neither digits nor the HYPHEN-MINUS.
For further details, see Section 3.2.1, "Repertoire" in [LGR-3].
Each code point or range is tagged with the script or scripts that the code point is used with, and
a reference to the Unicode Standard in which the code point was first encoded, see "References" below.
Some code points are also tagged with script-specific classifications. These tags have been prefixed
with the Unicode script identifier.
Variants
The variant mappings in this LGR are the union of the non-reflexive variant mappings from all the Element LGRs
that have been integrated into this version of the Root Zone LGR. Because the
disposition of variant labels, for example as "allocatable", is specific to each script, information related to that cannot be
expressed in the script-neutral context of this merged file. Instead, all merged variant mappings are
labeled as "blocked" in this document as needed for conflict checking. See also Section 3.2.2, "Variants" in [LGR-3].
Context Rules for Variants: some of the variants defined in this LGR are "effective null variants", that is,
some code points in the source map to "nothing" in the target with all other code points unchanged.
(Because mappings are symmetric, it does not matter whether it is the forward or reverse mapping that
maps to "null"). Such variants require a context rule to keep the variant set well behaved. Symmetry requires
the same context rule for both forward and reverse mappings.
In other cases, the sequences or code points making up source and target are constrained by explicit
context rules on the code points (or by implicit context rules defined for the adjacent code points).
In such a case, any variants may require context rules that match the intersection
between the effective contexts for both source and target; otherwise, a sequence might be considered valid in some
variant label when it would not be valid in an equivalent context in an original label.
The specification of variants in the Root Zone LGR follows the guidelines in [RFC 8228].
Character Classes
This merged LGR includes the cumulative set of character classes from all the Element LGRs
that have been integrated into this version of the Root Zone LGR. See Section 3.2.3, "Character Classes" in [LGR-3].
The names for any script-specific character classes have been prefixed with the Unicode script identifier.
Whole Label Evaluation (WLE) and Context Rules
This merged LGR includes the cumulative set of WLE and contextrules and actions from all the Element LGRs
that have been integrated into this version of the Root Zone LGR. See Section 3.2.4, "Whole Label EvaluationRules (WLE)"
[LGR-3]. See also the comments given for each rule or action.
Default Whole Label Evaluation Rules and Actions
The integrated LGR includes the set of required default WLE rules and actions applicable to the Root Zone
and defined in [MSR-4]. They are marked with ⍟. These default rules include the restrictions defined in
[RFC 5891] on placement of combining marks.
Script-specific WLE rules
The names for any script-specific rules have been prefixed with the Unicode script identifier.
Methodology and Contributors
The Root Zone Label Generation Rules - LGR-3 [LGR-3] were integrated by the Integration Panel [IP],
from a set of proposals for script-based root zone LGRs developed by community-based Generation
Panels [GPs] in an open process with multiple public consultations defined in [Procedure] and [Guidelines].
For more information on the methodology and contributors, see [LGR-3], in particular Section 2 "Process of Integration"
and Section 8, "Contributors".
References
In the listing of the repertoire, references starting at [0] refer to Unicode Standard versions in which the
corresponding code points were initially encoded. References [100] and above correspond to the script-specific
LGRs that include the repertoire item. Repertoire items may have more than one reference.
In addition the following references are cited in this document:
- [GPs]
- Internet Corporation for Assigned Names and Numbers (ICANN), "Generation Panels", https://community.icann.org/display/croscomlgrprocedure/Generation+Panels
(Accessed on 20 Nov. 2015)
- [Guidelines]
- Internet Corporation for Assigned Names and Numbers, (ICANN),“Guidelines for Developing Script-Specific Label Generation Rules for Integration into the Root Zone LGR”.
(Los Angeles, California: ICANN, December 2014), https://community.icann.org/download/attachments/43989034/Guidelines-for-LGR-2014-12-02.pdf
- [IP]
- Internet Corporation for Assigned Names and Numbers, (ICANN), "Integration Panel" https://community.icann.org/display/croscomlgrprocedure/Integration+Panel,
(Accessed on 20 Nov. 2015)
- [LGR-3]
- Integration Panel, "Root Zone Label Generation Rules - LGR-3: Overview and Summary", 25 April 2019, https://www.icann.org/sites/default/files/lgr/lgr-3-overview-25apr19-en.pdf
- [MSR-4]
- Integration Panel, "Maximal Starting Repertoire - MSR-4: Overview and Rationale", 7 February 2019,
https://www.icann.org/en/system/files/files/msr-4-overview-25jan19-en.pdf
- [Procedure]
- Internet Corporation for Assigned Names and Numbers, "Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels."
(Los Angeles, California: ICANN, March, 2013) http://www.icann.org/en/resources/idn/variant-tlds/draft-lgr-procedure-20mar13-en.pdf
- [Proposal-Arabic]
- TF-AIDN, "Proposal for Arabic Script Root Zone LGR", 18 November 2015 (PDF), https://www.icann.org/en/system/files/files/arabic-lgr-proposal-18nov15-en.pdf
- [Proposal-Devanagari]
- Neo-Brahmi Generation Panel, "Proposal for the Devanagari Script Root Zone LGR", 22 April 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-devanagari-lgr-22apr19-en.pdf
- [Proposal-Ethiopic]
- Ethiopic Script Generation Panel, "Proposal for Ethiopic Script Root Zone LGR", 17 May 2017 (PDF), https://www.icann.org/en/system/files/files/proposal-ethiopic-lgr-17may17-en.pdf
- [Proposal-Georgian]
- Georgian Script Generation Panel, "Proposal for the Georgian Script Root Zone LGR", 24 November 2016 (PDF), https://www.icann.org/en/system/files/files/proposal-georgian-lgr-24nov16-en.pdf
- [Proposal-Gujarati]
- Neo-Brahmi Generation Panel, "Proposal for the Gujarati Script Root Zone LGR", 6 March 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-gujarati-lgr-06mar19-en.pdf
- [Proposal-Gurmukhi]
- Neo-Brahmi Generation Panel, "Proposal for the Gurmukhi Script Root Zone LGR", 22 April 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-gurmukhi-lgr-22apr19-en.pdf
- [Proposal-Hebrew]
- Hebrew Generation Panel, "Proposal for a Hebrew Script Root Zone Label Generation Ruleset (LGR)", 24 April 2091 (PDF), https://www.icann.org/en/system/files/files/proposal-hebrew-lgr-24Apr2019-en.pdf
- [Proposal-Kannada]
- Neo-Brahmi Generation Panel, "Proposal for the Kannada Script Root Zone LGR", 6 March 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-kannada-lgr-06mar19-en.pdf
- [Proposal-Khmer]
- Khmer Generation Panel, “Proposal for Khmer Script Root Zone Label Generation Rules (LGR)”, 15 August 2016 (PDF), https://www.icann.org/en/system/files/files/proposal-khmer-lgr-15aug16-en.pdf
- [Proposal-Lao]
- Lao Script Generation Panel, "Proposal for a Lao Script Root Zone LGR", 31 January 2017 (PDF), https://www.icann.org/en/system/files/files/proposal-lao-lgr-31jan17-en.pdf
- [Proposal-Malayalam]
- Neo-Brahmi Generation Panel, "Proposal for the Malayalam Script Root Zone LGR", 22 April 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-malayalam-lgr-22apr19-en.pdf
- [Proposal-Oriya]
- Neo-Brahmi Generation Panel, "Proposal for the Oriya Script Root Zone LGR", 6 March 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-oriya-lgr-06mar19-en.pdf
- [Proposal-Sinhala]
- Neo-Brahmi Generation Panel, "Proposal for the Sinhala Script Root Zone LGR", 22 April 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-sinhala-lgr-22apr19-en.pdf
- [Proposal-Tamil]
- Neo-Brahmi Generation Panel, "Proposal for the Tamil Script Root Zone LGR", 6 March 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-tamil-lgr-06mar19-en.pdf
- [Proposal-Telugu]
- Neo-Brahmi Generation Panel, "Proposal for the Telugu Script Root Zone LGR", 6 March 2019 (PDF), https://www.icann.org/en/system/files/files/proposal-telugu-lgr-06mar19-en.pdf
- [Proposal-Thai]
- The Generation Panel for the Thai Script LGR, "Proposal for the Thai Script Root Zone LGR", 25 May 2017 (PDF), https://www.icann.org/en/system/files/files/proposal-thai-lgr-25may17-en.pdf
- [RFC 5891]
- J. Klensin, "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, August 2010, https://www.rfc-editor.org/info/rfc5891
- [RFC 6365]
- Hoffman, P. and J. Klensin, "Terminology Used in Internationalization in the IETF", BCP 166, RFC 6365, DOI 10.17487/RFC6365, September 2011, https://www.rfc-editor.org/info/rfc6365
- [RFC 7940]
- Davies, K. and A. Freytag, "Representing Label Generation Rulesets Using XML", RFC 7940, August 2016, https://www.rfc-editor.org/info/rfc7940
- [RFC 8228]
- A. Freytag, "Guidance on Designing Label Generation Rulesets (LGRs) Supporting Variant Labels", RFC 8228, August 2017,
https://www.rfc-editor.org/info/rfc8228
- [Unicode 6.3]
- The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View,
CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5) http://www.unicode.org/versions/Unicode6.3.0/
For more details for references [100] and up and [0] and up refer to the Table of References below, as well as to [LGR-3].
]]>