Skip to main content

IDNA Protocol Review and Proposals for Changes

In follow up to the RFC4690 that was made available in September and announced on on 19 October 2006, a review of the IDNA protocol is underway.

The review is managed through the IAB and the IETF and was initiated to revise the protocol from being based on the Unicode version 3.2 (as was the latest version at the time of the development of the protocol in 2003) to the current version 5.0 of Unicode. This latter version contains many more characters that hence otherwise would not be available for usage in domain names.

As previously announced, in September, the basic framework discussing such review was published as RFC4690. This paper is discussing issues related to language specific character issues where the same script is used across different language, issues related to cases where languages can be expressed by using more than one script, bi-directional cases as well as the topic of visually confusing characters.

In response to some of these issues a set of internet-drafts providing suggested solutions and revisions to the IDNA protocol was released publicly in the past couple of weeks. ICANN urges all interested Internet users to take part in this development work. Of particular importance the proposals includes revising the IDNA protocol to be based on an inclusion list of characters. The three internet drafts can be found via the links below and the last link provides an initial version of the inclusion list of characters. Part of the work that is still underway includes a process by which the community will be able to add characters to this inclusion based list, however, initially if these changes are accepted and implemented it means that not all characters will initially be available for registration in domain names and only those characters passing through the mentioned process will be made available.

For more information about IDNs see

More Announcements
Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as"""" is not an IDN."