ICANN Logo

Guidelines for the Implementation of Internationalized Domain Names
Draft Version 2.0

20 September 2005


Draft Version 2.0 of these Guidelines was published on 20 September 2005. It reflects the experiences of the IDN registries in the implementation of Version 1.0 of the guidelines. Particular attention has been paid to concerns that have arisen about the deceptive use of visually confusable characters from different scripts in individual IDN labels.

The following steps were taken in the development of version 2.0:

A draft revision of version 1.0 of the IDN Guidelines was prepared by:

  • gTLD Registry Constituency Representatives:
    • Cary Karp, MuseDoma
    • Pat Kane, VeriSign
    • Ram Mohan, Afilias
  • ccNSO Representatives:
    • Hiro Hotta, JPRS
    • Mohammed EL Bashir, .sd Registry
  • ICANN Staff:
    • Tina Dam

The initial draft of version 2.0 is currently posted for public comments for 30 days and will subsequently be modified in accordance with commentary received during this period. A final draft of version 2.0 will then be submitted to the ICANN Board for endorsement.

The gTLD registries are required to establish IDN policies in conformance with these Guidelines and will implement support for the procedures detailed below with all due expediency. With the exception of administrative detail that is clearly specific to TLD operation, these Guidelines are intended to be implementable in other registries, on all levels.

As the deployment of IDNs proceeds, ICANN and the IDN registries will review these Guidelines at regular intervals, and revise them additionally as need is indicated by experience.

Previous Versions

Version 1.0: The ICANN Guidelines for the Implementation of Internationalized Domain Names Version 1.0, were published on 20 June 2003, coinciding with the commencement of IDN deployment in accordance with the IETF Proposed Standard for Internationalized Domain Names in Applications as stated in RFCs 3454, 3490, 3491,and 3492. The implementation approach set forth in the Version 1.0 Guidelines was endorsed by the ICANN Board on 27 March 2003.

Guidelines

1. Top-level domain registries that implement internationalized domain name capabilities will do so in strict compliance with the technical requirements described in RFCs 3454, 3490, 3491, and 3492 (collectively, the "IDN standards").

2. In implementing the IDN standards, top-level domain registries will employ an "inclusion-based" approach (meaning that code points which are not explicitly permitted by the registry are prohibited) for identifying permissible sets of code points from among the full Unicode repertoire, as described below.

3. (a) In implementing the IDN standards, top-level domain registries will associate each label in a registered internationalized domain name, as it appears in their registry, with a single language or a single script using accepted designators for both. The restriction, in either case, is intended to limit the set of permitted characters within a label. If greater specificity is desired, the association may be made by combining both a language designator and a script designator. Alternatively, a label may be associated with a set of languages, or with more than one designator under the conditions described below. Language designators are illustrated in RFC 3066 (http://www.rfc-editor.org/rfc/rfc3066.txt). Script designators are illustrated in ISO 15924 and Unicode Technical Report #23 (http://www.unicode.org/reports/tr23/). (b) A registry will publish the aggregate set of code points that it makes available in clearly identified IDN-specific character tables, and must define equivalent character variants if registration policies are established on their basis. Any such table must be designated in a manner that indicates the language(s) and/or script(s) it is intended to support. (c) All code points in a single label must be taken from the same script as determined by the Unicode character properties (UTR#23). Exception to this is permissible for languages with established orthographies and conventions that require the commingled use of multiple scripts. Visually confusable characters from different scripts must not appear in a single label unless there are overriding legitimate linguistic reasons for doing so. Each such situation must be associated with a specific language and a corresponding character table must be available before registration of such names can be accepted. (d) All registry policies based on these considerations must be documented and publicly available, including a character table for each permissible set of code points, before the registration of any IDN associated with such an aggregate may be accepted.

4. Permissible code points will not include: (a) line symbol-drawing characters (as those in the Unicode Box Drawing block), (b) symbols and icons that are neither alphanumeric nor ideographic language characters, such as typographical and pictographic dingbats, (c) punctuation characters that lack grammatical significance in the language with which the IDN registration is associated (with necessary punctuation including characters such as the ETHIOPIC WORDSPACE in Amharic and the MIDDLE DOT in Catalan), and (d) other characters with well-established functions as protocol elements. When a registry determines that an exception to any of these rules is appropriate, as discussed in Guideline #3, the basis for that decision must be documented in the IANA Registry for IDN Tables or otherwise made readily available online. A registry may not even by exception permit code points that are prohibited by the IDN standards.

5. A registry must define the scope of an IDN registration in terms of both its Unicode and ASCII-encoded representations. The availability of a given Unicode sequence is currently determined by its encodability into the scheme defined in RFC 3491, and changes to that component of the IDN standard can have disruptive consequences for the operability of a Unicode name. For this reason an IDN registry should treat the ASCII-encoded form as the primary registered name, and include in its documentation a description of the factors that determine the way that sequence appears at the user interface.

6. Top-level domain registries will work collaboratively with relevant and interested stakeholders to develop IDN-specific registration policies, with the objective of achieving consistent approaches to IDN implementation for the benefit of DNS users worldwide. Top-level domain registries will work collaboratively with each other to address common issues, for example by forming or appointing a consortium to coordinate contact with external communities, elicit the assistance of support groups, and establish global fora.

7. Top-level domain registries (and registrars) must make definitions of what constitutes an IDN registration and associated registration rules available to the <IANA Registry for IDN Tables>. If material fundamental to the understanding of a registry’s IDN policies is not published by the IANA, it must otherwise be made readily available online by the registry.

8. The top-level domain registries should provide resources containing information about the sources and references that were used in the formation of the corresponding IDN registration policies for all languages and scripts in which they offer IDN registrations.

Administrative details

For Registries Having Agreements with ICANN: Registries with sponsorship agreements or registry agreements with ICANN must also comply with the format requirements for Registered Names in their sponsorship or registry agreements. In one way or another, the agreements state that all Registered Names (including ACE names) will comply with the following syntax in augmented Backus-Naur Form (BNF) as described in RFC 2234:

dot = %x2E ; "."
dash = %x2D ; "-"
alpha = %x41-5A / %x61-7A ; A-Z / a-z
digit = %x30-39 ; 0-9
ldh = alpha / digit / dash
id-prefix = alpha / digit
label = id-prefix [*61ldh id-prefix]
sldn = label dot label
hostname = *(label dot) sldn

In addition, length limitations should be observed.

To meet these requirements, the UseSTD3ASCIIRules flag described in RFC 3490 should be set when in performing ToASCII conversions to produce ACE names, and the resulting format restriction should be interpreted as above.

For Unsponsored gTLDs: Rulesets based on these Guidelines must not interfere with the equivalent access to Registry Operator's Registry Services by all ICANN-Accredited Registrars that have Registry-Registrar Agreements in effect. Registry operators of unsponsored TLDs will ordinarily give thirty days notice to ICANN and accredited, authorized registrars of the establishment or revision of rulesets. In urgent situations, the registry operator and ICANN may agree in writing on a shorter time. Special terms may also be attached to the release of time-sensitive information, for example, in situations where land rush effects are anticipated.

Additional remarks

The deceptive use of visually confusable characters from different scripts is discussed in detail in the Unicode Technical Report #36 on ‘Unicode Security Conditions’ at http://www.unicode.org/reports/tr36/. Limitations to the character repertoire available for IDNs are suggested there in tables presented under the heading “Data files”.

The list of languages in ISO 639-2 is currently being revised in preparation for ISO 639‑3, which is in an advanced draft (as of the date of the present Guidelines). The normative reference to BCP47 made in the terms for the IANA IDN Language Table Registry will require modification when ISO 639‑3 is finalized, and it should also be noted that the IETF is currently dealing with the final draft of a successor document to that BCP. This will provide expanded means for specifying languages, including designations for script and orthographic authority as components of a language tag. That revision is being prepared by the IETF Language Tag Registry Update working group (ltru). As its work acquires formal normative status, the results may require further modification to the IDN Guidelines.

The aggregation of languages on the basis of their shared use of a single script (such as Latin-script African or European languages) may ease the development of focused IDN policies in technical and other regards, thus reducing potential for confusion. Unless there is need to associate individual labels in an IDN with different scripts, even where script-based policies are otherwise applied, the least confusing way to designate an IDN will often be by association with a single language. However, the current restriction of top-level labels to the 26-letter basic Latin alphabet will frequently necessitate that the language attributes of an IDN be determined without consideration of the top-level label. The discussion that is in progress about permitting a more extensive character repertoire in top-level labels can result in a change to this condition, as well as raising need for further guidelines specific to the new situation.


Comments concerning the layout, construction and functionality of this site
should be sent to webmaster@icann.org.

Page Updated 21-Feb-2007
© 2003  The Internet Corporation for Assigned Names and Numbers. All rights reserved.