During the new gTLD program's Pre-Delegation Testing (PDT), ICANN has noted a large number of IDN table submissions. The IDN tables submitted by the new gTLD registries varied in the repertoire of characters, variant and contextual rules, and format. On the request of the community, to facilitate and improve consistency of testing and stability of registry operations of new gTLDs, ICANN intends to develop reference IDN tables in machine readable format, called Label Generation Rulesets (LGRs) for the second level, for use in PDT and the Registry Service Evaluation Process (RSEP). ICANN is suggesting a process to ensure both linguistic and technical expert input and community review of these tables being developed. The community is requested to review and provide feedback on the effectiveness of the suggested process for the development of these tables, and propose any further improvements.
Section I: Description and Explanation
The LGRs are to be developed for the languages given below, organized in two batches and prioritized on the basis of complexity and demand. Division in batches allows Batch 1 languages to be released sooner and halves the number of LGRs to review at a time by the community. Additional languages will be added later, as needed.
- Batch 1: Japanese, Korean, Chinese, Danish, Norwegian, Latvian, Lithuanian, Russian, Arabic, Ukrainian, Belarusian, Bulgarian, Macedonian, Bosnian (in both Cyrillic and Latin scripts), Serbian, Hebrew
- Batch 2: English, Spanish, French, German, Portuguese, Polish, Swedish, Italian, Hungarian, Icelandic, Finnish, Montenegrin
The following steps provide the intended process to undertake the work:
Development of Overall Guidelines
- As the first step, the provider will create a detailed set of guidelines and process to undertake the work.
- Once the guidelines and process are signed off with ICANN, the provider will proceed with the creation and verification of the reference IDN tables. It is recommended that the work done by .SE already should be used as the baseline.
Analysis and Documentation
- For each language, the authoritative sources will be gathered and analyzed. These include national and international standards, published dictionaries and other sources identified in the guidelines.
- Based on the analysis of these authoritative sources, review of other data (e.g. IDN tables published by IANA and .SE, informational RFCs, etc.), the reference LGRs suitable for second level will be created in the machine readable format defined in https://tools.ietf.org/html/draft-davies-idntables (the XML Format). If authoritative sources are not available, a more rigorous creation process would need to be instituted, to be specified in the guidelines.
- A concise document is created along with each language LGR, listing the authoritative and other sources used, process followed (especially if authoritative sources are not available) and summarizing the analysis and conclusions. The document should either confirm that no deviation is needed or justify any suggested variance from the authoritative sources. Any allowable variations based on contexts, if any (e.g. difference in code points in a language across different regions), will also be documented.
- As an independent follow up step, the reference LGR and the associated documentation are reviewed by linguistic expert(s) of that language and script, who confirm that the documentation and contents of the LGR for the second level are adequate and complete. A separate linguistic expert review report for each LGR is created at this stage.
- As an independent follow up step, the reference LGR and the associated documentation are reviewed by relevant technical DNS and IDN expert(s), who confirm that the proposed documentation and contents of the LGR provided for the second level adequately addresses any security and stability concerns. A separate technical expert review report for each LGR is created at this stage.
- The reference language based LGRs, the associated documentation and the related language and security and stability expert reports for each language will be released for public comments by ICANN, separately for the two batches.
- The public comments will be considered and the reference LGRs and their associated documentation will be updated based on the feedback received.
Finalization and Publication
- The final set of LGRs and associated documents will be released by ICANN as a reference for PDT and RSEP with open license for use by the community.
Section II: Background
The registries are generally encouraged to collaborate together in defining common language based or script based tables to allow for consistency for end users. There are multiple formats to submit IDN tables, and applicants can also arbitrarily use their own format. The IDN tables used by each gTLD and some ccTLDs are posted at the IANA Repository for IDN Practices.
Section III: Relevant Resources
The IANA Repository for IDN Practices: https://www.iana.org/domains/idn-tables
The machine readable format for Label Generation Rulesets: https://tools.ietf.org/html/draft-davies-idntables
IDN tables and associated information made available by .SE: https://github.com/dotse/IDN-ref-tables