Root Zone Label Generation Rules for the Arabic Script
Overview
This file contains a set of Label Generation Rules (LGR) for Arabic for the Root Zone.
For more details on this LGR and its development, as well as background on the script, see
TF-AIDN, "Proposal for Arabic Script Root Zone LGR", Version 3.4, 18 November 2015 [Proposal-Arabic].
This file is one of a set of LGR files that together form an integrated LGR for the DNS Root Zone [RZ-LGR-4].
The format of this file follows [RFC 7940].
Repertoire
The repertoire is described in Section 3.2 of [Proposal-Arabic] and
only includes the 128 code points used by languages that are actively written in the Arabic script. It
excludes code points for which TF-AIDN was unable to find sufficient evidence of use (see Appendix F in [Proposal-Arabic]). The repertoire is
based on [MSR-4], which is a subset of [Unicode 6.3].
This LGR does not include combining marks or code point sequences. All combining marks have been
excluded for these reasons:
- First, they can significantly overproduce and would require additional rules to contain them effectively,
complicating the design.
- Second, even where they are required for some languages, they are optional for others.
- Third, this also circumvents the issue regarding duplication between some precomposed code points and combining sequences raised by [IAB].
For further details, see Section 3.2, "Code point repertoire included", in [Proposal-Arabic].
As part of the Root Zone, this LGR includes neither digits nor the HYPHEN-MINUS.
Each code point or range is tagged with the script or scripts that the code point is used with, and one or more
references documenting sufficient justification for inclusion in the repertoire; see "References" below.
Comments identify the languages using the code point.
Variants
This LGR includes "blocked" and "allocatable" variants, assigned according to Section 4,
"Final recommendation of variants for Top Level Domains (TLDs)" in [Proposal-Arabic].
These recommendations balance the desire to minimize the number of possible allocatable variants with the need to keep the
definition of variants simple. See also the comments given in the listing.
The specification of variants in the Root Zone LGR follows the guidelines in [RFC 8228].
Character Classes
This proposal does not define named character classes.
Whole Label Evaluation (WLE) and Context Rules
Default Whole Label Evaluation Rules and Actions
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟.
Arabic-specific Rules and Actions
This LGR includes WLE rules and actions specific to the Arabic script. See Section 5, "Whole Label Evaluation (WLE) rules", in [Proposal-Arabic].
As specified, the rules and actions serve to prevent the mixing of two variants of the same code point within the same label. This reduces overproduction
of variant labels. The rules are listed here with the numbers given in Table 17 in [Proposal-Arabic]. See also the comments given for each rule or action.
- no-mix-kaf-keheh — WLE Rule 1: do not mix Arabic letters KAF and KEHEH in the same label
- no-mix-kaf-swash — WLE Rule 2: do not mix Arabic letters KAF and SWASH KAF in the same label
- no-mix-alef-maksura-farsi-yeh — WLE Rule 3: do not mix Arabic letters ALEF MAKSURA and FARSI YEH in the same label
- no-mix-heh-goal — WLE Rule 4: do not mix Arabic letters HEH and HEH GOAL in the same label
- no-mix-heh-goal-ae — WLE Rule 5: do not mix Arabic letters HEH GOAL and AE in the same label
- no-mix-heh-ae — WLE Rule 6: do not mix Arabic letters HEH and AE in the same label
- no-mix-heh-doachashmee — WLE Rule 7: do not mix Arabic letters HEH and HEH DOACHASHMEE in the same label
- no-mix-teh-marbuta-goal — WLE Rule 8: do not mix Arabic letters TEH MARBUTA and FEH WITH DOT MOVED BELOW in the same label
- no-mix-noon-with-three-dots-above-yeh-with-three-dots-below — WLE Rule 9: do not mix Arabic letters NOON WITH THREE DOTS ABOVE and YEH WITH THREE DOTS BELOW in the same label
- no-mix-peh-noon-with-three-dots-above — WLE Rule 10: do not mix Arabic letters PEH and NOON WITH THREE DOTS ABOVE in the same label
- no-mix-feh-with-dot-moved-below — WLE Rule 11:do not mix Arabic letters FEH and FEH WITH DOT MOVED BELOW in the same label
- no-mix-qaf-with-dot-above — WLE Rule 12: do not mix Arabic letters QAF and QAF WITH DOT ABOVE in the same label
- no-mix-feh-qaf-with-dot-above — WLE Rule 13: do not mix Arabic letters FEH and QAF WITH DOT ABOVE in the same label
- no-mix-kaf-with-ring-gaf — WLE Rule 14: do not mix Arabic letters KAF WITH RING and GAF in the same label
- no-mix-kaf-with-ring-keheh-with-three-dots-above — WLE Rule 15: do not mix Arabic letters KAF WITH RING and KEHEH WITH THREE DOTS ABOVE
- no-mix-gaf-keheh-with-three-dots-above — WLE Rule 16: do not mix Arabic letters GAF and KEHEH WITH THREE DOTS ABOVE in the same label
Methodology and Contributors
The Root Zone LGR for the Arabic script was developed by the Task Force for Arabic Script IDNs [TF-AIDN].
For more information and for methodology and contributors, see [Proposal-Arabic], as well as [RZ-LGR-4-Overview].
References
The following general references are cited in this document:
- [IAB]
- Internet Architecture Board (IAB), "IAB Statement on Identifiers and Unicode 7.0.0"
https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/
- [MSR-4]
- Integration Panel, "Maximal Starting Repertoire — MSR-4 Overview and Rationale",
7 February 2019, https://www.icann.org/en/system/files/files/msr-4-overview-25jan19-en.pdf
- [Proposal-Arabic]
- TF-AIDN, "Proposal for Arabic Script Root Zone LGR", Version 3.4, 18 November 2015 https://www.icann.org/en/system/files/files/arabic-lgr-proposal-18nov15-en.pdf
- [RFC 6365]
- Hoffman, P. and J. Klensin, "Terminology Used in Internationalization in the IETF", BCP 166, RFC 6365, DOI 10.17487/RFC6365, September 2011, http://www.rfc-editor.org/info/rfc6365
- [RFC 7940]
- Davies, K. and A. Freytag, "Representing Label Generation Rulesets Using XML", RFC 7940, August 2016, http://www.rfc-editor.org/info/rfc7940
- [RFC 8228]
- A. Freytag, "Guidance on Designing Label Generation Rulesets (LGRs) Supporting Variant Labels", RFC 8228, August 2017,
https://www.rfc-editor.org/info/rfc8228
- [RZ-LGR-4-Overview]
- Integration Panel, "Root Zone Label Generation Rules - LGR-4: Overview and Summary", 05 November 2020 (PDF), https://www.icann.org/sites/default/files/lgr/lgr-4-overview-05nov20-en.pdf
- [RZ-LGR-4]
- Integration Panel, "Label Generation Rules for the Root Zone — LGR-4", 05 November 2020 (XML), https://www.icann.org/sites/default/files/lgr/lgr-4-common-05nov20-en.xml
non-normative HTML presentation: https://www.icann.org/sites/default/files/lgr/lgr-4-common-05nov20-en.html
- [Unicode 6.3]
- The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)
http://www.unicode.org/versions/Unicode6.3.0/
- [TF-AIDN]
- Blog, "Task Force for Arabic Script IDNs" https://www.icann.org/news/blog/what-is-the-task-force-on-arabic-script-idns-tf-aidn-up-to
For references consulted particularly in designing the repertoire for the Arabic script for the Root Zone
please see details in the Table of References below.
References [0] to [12] refer to the Unicode Standard versions in which the
corresponding code points were initially encoded. References [100] and above correspond to sources
given in [Proposal-Arabic] justifying the inclusion of the corresponding code points. Entries in the table may have
multiple source reference values.
]]>