Skip to main content

Internationalization and Localization

If there is one question or concern related to IDNs that have been asked continuously in the past year it has to be that of internationalizing versus localizing the domain name space. In other words, questions such as:

“Well, but I don’t have a keyboard that enables me to type in all these new characters, so I cannot type in these IDN addresses, why are you allowing this to happen?”

“I don’t understand <insert any language> so IDNs will keep me from accessing and using the Internet like I used to, that is not a positive development, why are you doing this?”

The answers are really quite simple. Internationalizing the domain names is not done in order for all users across the world to be able to type in all domain names. It is done to ease local communication and accessibility. It is called “internationalization” (and not “localization”) because the Internet and the DNS is a function that needs to work on a global or international level. However, it is the individual user or business choice as to what characters (within the current standard and registry offerings) their domain name should contain.


Think of it this way, if you for example are launching a website with information or services to Russian users, with Russian content, then it is quite natural for the domain name used in the web URL to be in Cyrillic characters that Russian users are able to understand and use. Users that do not understand Russian most likely would not be interested in accessing the site, would not see any advertisement about the site, and hence has no need to be able to type in the address for the URL and go to this site. The same arguments can of course be made for other languages.

To stress the point further, today there is a large amount of content online in various languages, specifically targeted local language communities and the fact that domain names today so far are internationalized only at the second level (i.e. the “icann” part in the domain name “icann.org”) makes it impossible in many cases for the web URL to contain characters solely from one language.

It can be argued that search engines to some extent are solving these problems for many users, as search engines are becoming widely localized; however, this is not always sufficient. It can be quite confusing, especially when offline media in a local language is listing a URL containing characters unfamiliar to those reading the media. Fr the example above, it is confusing to have a latin-constructed address in a cyrillic/russian offline media, where the reader need to reconstruct that address later when online. I can easily imagine how difficult it would be if I had to remember an address in Chinese characters (I don’t read or understand Chinese) and how much trouble I would have if I needed to reconstruct that address later online.

I am not trying to reopen the discussion of whether we need IDNs or not. It is clear that we do and the main focus on ICANN’s IDN Program is that of introducing Internationalized TLDs. But sometimes the reasoning behind why we need IDNs is misunderstood, and in any event questions as the above are the ones I have answered the most times during the past year. As such the primary message with this post is to restate that IDNs and IDN TLDs are not being deployed with the intent for all users to have and be able to use all characters allowed in domain names. It is done for user choices to be made available. That is, choices that fit the local communication better than that of using the basic Latin characters that was originally the only choice for domain names.

(The Internet and Internationalization is obviously not “just” web URLs, but they are used above for ease of illustration.)

Comments

    Andy  22:33 UTC on 14 May 2018

    I think it is always best for local search-ability to have local characters, but it can be difficult to find exactly which domains are best without local or SME input.

Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."