closed Additional Unicode Scripts for Support in Internationalized Domain Names
What We Received Input On
Based on its mission to support Internationalized Domain Names (IDNs), ICANN org has been developing resources for 28 widely used scripts shortlisted in the Maximal Starting Repertoire (MSR). As this work matures, ICANN org will work on any additional scripts for which IDNs need to be supported. The Unicode standard encodes 159 scripts in its latest version 14.0.0. In Annex # 31, the standard categorizes these 159 scripts as Recommended, Limited Use, or Excluded for their use in defining identifiers. As IDNs are also identifiers based on the Unicode standard, this categorization of scripts is also relevant for the second-level and top-level IDN labels.
Initial analysis conducted by experts in a report on Evaluating Unicode Scripts for Use in IDNs suggests only Recommended scripts, excluding Bopomofo, be considered for the top-level IDN labels, due to the conservative nature of the Root Zone. This aligns with the current version of the MSR. If any additional scripts are to be included in future versions of the MSR, these would need to be considered by the Integration Panel as per the procedure to develop and maintain the Root Zone Label Generation Rules (RZ-LGR Procedure). Recommended scripts are also suitable for the second level domain names according to the report. Scripts classified as Limited Use by the Unicode are generally suitable for the second level domain names, but should be evaluated on a case-by-case basis as per the report. The report also lists the factors that could be used for making such a decision. Scripts in the Excluded set include scripts that are unsuitable for identifiers on multiple grounds, including technical reasons and the absence of a modern native user community that would be able to use these scripts for useful mnemonic identifiers in a familiar language. Therefore, the report does not recommend the use of Excluded scripts in domain names at the second level or top level.
The community is being asked to provide feedback on the analysis and the recommendations presented in the report on Evaluating Unicode Scripts for Use in IDNs to guide ICANN org’s continuing work on the implementation of IDNs.
|Proposals For Your Input|
It is part of ICANN’s Mission “to adopt or implement policies or procedures that take into account the use of domain names as natural-language identifiers” while preserving the security and stability of the Domain Name System (DNS). ICANN org has been working with the support and guidance of the community to implement IDNs to provide a broader access to the DNS in line with its mission. Following the RZ-LGR Procedure, RZ-LGR has been developed and currently integrates 18 of the 28 scripts shortlisted in the MSR, while most of the remaining script proposals are in progress or published and awaiting integration in the RZ-LGR. In addition, multiple language and script based Reference LGRs for the Second Level have also been published for the community covering these 28 scripts. As the work on developing resources for these 28 scripts matures, ICANN org will work on any additional scripts for which IDNs need to be supported.
The Unicode standard encodes 159 scripts in its latest version 14.0.0. The Unicode Standard Annex # 31: Unicode Identifier and Pattern Syntax (UAX#31), which “forms an integral part of the Unicode Standard … describes specifications for recommended defaults for the use of Unicode in the definitions of general-purpose identifiers,…” UAX#31 categorizes the 159 scripts encoded by the Unicode standard as Recommended, Limited Use, or Excluded for their use in identifiers, providing the following details:
- Recommended Scripts: “generally recommended for use in identifiers” because “these are in widespread modern customary use or are regional scripts in modern customary use by large communities.”
- Limited Use Scripts: “in more limited use,” so “to avoid security issues, some implementations may wish to disallow the limited-use scripts in identifiers.”
- Excluded Scripts: “not in customary modern use, and thus implementations may want to exclude them from identifiers” because “these include historic and obsolete scripts, scripts used mostly liturgically, and regional scripts used only in very small communities or with very limited current usage. Some scripts also have unresolved architectural issues that make them currently unsuitable for identifiers.”
ICANN org engaged experts to review the scripts and their Unicode categorization to determine the suitability of the scripts for IDNs. They have analyzed the scripts based on their usage by language communities, documented their analysis and made some recommendations in the report on Evaluating Unicode Scripts for Use in IDNs. The report reviews the classification of scripts undertaken in UAX#31 and explains how this classification may be applicable for determining the suitability for use of a script for IDNs at the top-level or at the second level.
Expert analysis in the report on Evaluating Unicode Scripts for Use in IDNs suggest only Recommended scripts, excluding Bopomofo, be considered for the Root Zone, where the final decision on inclusion of scripts lies with the Integration Panel, as per the procedure to develop and maintain the Root Zone Label Generation Rules (RZ-LGR Procedure). According to the report, Recommended scripts are also suitable for the second level IDN labels. The scripts classified as Limited Use are generally suitable for the second level, but should be evaluated on a case-by-case basis. The report also lists the factors that would go into making such a decision. Limited Use scripts are not suitable for the RZ-LGR.
Scripts in the Excluded set combine scripts that are categorically unsuitable for identifiers on technical grounds with scripts that may be challenging to implement, and for which limited information is available. Common to all Excluded scripts is the absence of a modern native user community that would be able to use these scripts for useful mnemonic identifiers in a familiar language. Together with the often problematic and little understood nature of these scripts, that makes them unattractive targets for developing the requisite label generation rules for either the top level or the second level IDN labels.
The Unicode Technical Committee may occasionally revisit this classification of scripts based on evolving use of these scripts. ICANN org will follow the categorization provided in the Unicode standard as it evolves based on the guidance by the community.