Name: Nacho Amadoz
Date: 22 Nov 2021
Affiliation: CORE Association
Original Public Comment: Proposal for Latin Script Root Zone Label Generation Rules
1. Overall, does the proposal meet its goal of defining Internationalized Domain Name (IDN) labels for the Latin script that are suitable for the root zone?
If you selected no, please explain why.
We disagree with the general exclusion of diacritics as variants of the base characters. that is, a and a acute or grave or a and a with tilde are not considered as variants. We have the impression that this exclusion is not warranted in the real world of DNS usage. The fact is that if we were to ask any user whether sãopaulo.tld, méxico.tld or québec.tld are the same domains as saopaulo.tld, mexico.tld and quebec.tld everybody (but some IDN experts, typographers or language teachers, perhaps) would say they are the same. The reason is not simply visual. It is the obstinate realiy of over 25 years of strong conviction that words with diacritics are used in domain names without them. It is fair to say that as an industry we have failed to provide a compehensible and easy use of IDNs so far. But the net result is what it is: users will universally see the string without the diacritic as the “main” version of the one with such diacritic. Perahps not in German where Mûller.tld "may" also be mueller.tld, but it is certainly the overwhelming view of Latin-script users as of today. We believe that the standard that should be applied when considering variants is not only, and not mainly, the percepcion of, say, typography experts or IDN experts but, rather, that of the average DNS user. We also believe that the result seems guided by the principle of limiting the number of variants, but in our opinion the goal should be to serve the best interests of the useers and prevent excessive confusion, not a quantitative, aprioristic, question of principle. This point about diacritics has a clear and direct implication in existing TLDs, for instance .quebec managed by a customer of CORE. It is evident to any one in Québec, and also any DNS user aware of what a diacritic is, that .quebec and .québec can simply not be 2 different things, two different TLDs. They must be a single TLD, managed by the same Registry where second-level domains in both the version with accent and without it must belong to the same Registrants managed by the same Registrars, with the same Nameservers…. In practical terms, .quebec and .québec must be variants of the same TLDas anything else would be closer to a fraud than a confusion. The point is not that they are similar, is that they cannot be perceived as anything else than variants of the same thing. We request, therefore tht the question of diacritics be revisited in view of accepting them as variants when warranted.
2. In your view, are there any required technical changes to the proposal? Please list them and include explanations.
We respectfully disagree withthe exclusion of some codepoints which are, for example, perfectly acceptable according to IDNA 2008 and used at the second level by many Registries without any known issue. One such case is the “Ela geminada” for Catalan language, which uses Middel dot (U+00B7) between two Latin letter "l" (U+006C)
The reason seems to be clear but purely formal: the "Middle dot” is in a given list and not another, and needs some “context”. But the former argument is basically based in history more than function; and the second is weak in the sense that there is no real “context” in the sense that the codepoint behaves differently in different locations, before or after given codepoints etc. It is a simple, fixed and unmutable rule: can only be used between that character. Nothing else. We understand the preference for a conservative aproach, and if the discussipn had lasted for 2 hours, we would understand the outcome. But after so many years, we expect that the aprioristic questions of principle should not prevail without any real check over “how it really works”. We therefore request that the discussion on this concrete codepoint be reopened.
We would like to express our respect and gratitude to all individuals involved in this effort. We also thank them for taking into account a large number of languages, going beyond ISO 639-1 list and well into the ISO 639-2 one. Nevertheless, we find unnecessary and unfortunate to use the EGIDS Scale to “rank" languages in the provided documentation. EGIDS Scale, as developped by SIL/Ethnologue.com may be, besides quite controversial, very useful in sociolinguistics analysis of a given language's "health”. But using it here seems like establishing a layered status for languages for IDN or DNS usage. In fact, what counts is whether this work has taken into account the characters/codepoints used in a given language, but “ranking” them in your Appendices seems as odd as IANA listing ccTLDs in the root not in alphabetical order but by GDP per capita or by COvID-19 vaccination ratios: both confusing and unwarranted. Please list them by alphabetical order of their ISO 639-2 code.
Summary of Submission
- consideration of diacritics as variants
- inclusion of additional code points such as middle dot
- listing of languages in strict alphabetical order