Name: Gopal Tadepalli
Date: 9 Feb 2022
Affiliation: Anna University
Original Public Comment: Additional Unicode Scripts for Support in Internationalized Domain Names
1. Based on the discussion in the report, should ICANN org support IDNs at the second level in the scripts identified as Limited Use by Unicode in UAX#31, where specific scripts will be finalized on a case-to-case basis using the criteria in the report?
Yes, Limited Use scripts should be supported at the second level on a case-to-case basis based on the criteria in the report.
If you selected “no,” Limited Use scripts should not be supported for IDNs at the second level, please explain why? If you selected “yes,” are there any specific script(s) that should be clearly supported? If so, please list them and explain why?
The Unicode Consortium has sometimes used the likelihood of a combination of characters actually appearing in a natural language as a criterion for the safety. DNS names are often fabrications -- abbreviations, strings deliberately formed to be unusual, members of a series sequenced by numbers or other characters, and so on. Consequently, a criterion that considers a change to be safe if it would not be visible in properly-constructed running text is not helpful for DNS purposes: a change that would be safe under that criterion could still be quite problematic for the DNS. ICANN Restricted Code Points are specifically disallowed for IDN registrations. IDNs at Second Level in the scripts identified as "Limited Use" by UNICODE should adhere to the ICANN Restricted Code Points.
2. Are there any changes needed in the criteria suggested to select the Limited Use scripts for support at the second level on a case-to-case basis?
Yes, see the suggested changes specified below.
If yes, please suggest the changes in the criteria for shortlisting scripts for IDNs at the second level:
Conversion of the domain name into an ASCII-Compatible Encoding (ACE) Homoglyph Bundling Variants
3. Should ICANN support IDNs at the second level in scripts identified as Excluded by UAX#31 for identifiers?
No, Excluded scripts should not be supported at the second level, in line with Unicode recommendation in UAX#31.
4. Are there any additional factors which should be considered by the Integration Panel, in addition to the findings of this report using the categorization provided in the UAX#31, for shortlisting the scripts for the Root Zone Label Generation Rules?
The Variants and Confusables may need Supplementary Panel(s).
Please note that the Unicode Standard provides default algorithms for determining grapheme cluster boundaries, with two variants: legacy grapheme clusters and extended grapheme clusters. To my mind, algorithmic approach may resolve many concerns on the number of variants. It is just that the LGR need to happen in tandem with the algorithms.
With the Unicode 16-bit encoding system, over 65,000 characters can be encoded (216 = 65,536). However, the total number of characters that needs to be encoded has actually exceeded that limit. To find additional place for new characters, developers of the Unicode Standard decided to introduce the notion of surrogate pairs. Three supplementary planes are defined for this purpose.
In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word. UNICODE Planes 4 to 13 are unassigned i.e no characters have yet been assigned, or proposed for assignment.
Can ICANN open supplementary panel(s) for the Generation Panel for IDNs ?
Summary of Submission
ICANN Restricted Code Points
ASCII-Compatible Encoding (ACE)
Variants - Suggested Use of Supplementary Panel (s)