Standards
for ICANN Authorization of Internationalized Domain Name Registrations
in Registries with Agreements
Background
Internet domain names are easy-to-remember identifiers for hosts and
services on the Internet. Until now, only a subset of US-ASCII1
has been usable in domain names.
Over the past 2+ years, the Internationalized
Domain Name Working Group (IDN-WG) of the Internet Engineering Task
Force (IETF) has been working to internationalize the domain name system
at the application layer by standardizing a system for the translation
of non-ASCII characters into unique ASCII strings that can be resolved
by the existing domain name system.
In October 2002, a significant milestone toward internationalization
of the DNS was achieved when the Internet Engineering Steering Group
approved for publication the three documents that together define Internationalizing
Domain Names in Applications (IDNA), a standards-track protocol.2
These three documents were published as RFCs 3490,
3491, and 3492
in March 2003. Implementation of the IDNA specification by DNS registries
will allow users to use domain names with non-ASCII characters.
Implementation of IDNA brings significant risks for the domain name
system. In particular, serious concerns have been raised about the likelihood
of widespread user confusion and new opportunities for cybersquatting.
These risks can be greatly reduced by the adoption of sensible registry-level
policies, and by the coordination of consistent technical implementations
across DNS registries.
IDNA will be a major step forward for the domain name system, but only
if the DNS registries undertake to implement it in a thoughtful, responsible,
and standards-compliant manner. The role of ICANN in this process is,
of course, limited ICANN's agreements with the major gTLD
registries give it the responsibility to expressly authorize the registration
of IDNA-compliant internationalized domain names, but ICANN's
mission does not include micromanaging registry-level implementation.
ICANN's agreements with registry operators (including the gTLDs .com,
.net, .org, .info, .biz, .name, .pro, .aero, .coop, .museum) provide
that ICANN authorization is required before the registry can begin accepting
registrations of internationalized domain names (IDNs).3
This paper sets forth a proposed answer to the question "What standards
should ICANN apply in exercising its contractual responsibility to authorize
IDNA registrations in these registries?" The basic premise of this
paper is that ICANN should take a light-handed approach, mandating compliance
with the applicable technical standards and securing a registry commitment
to collaborate with the affected communities, relevant experts, and
other registries in developing appropriate language-specific registration
policies.
This approach anticipates that the DNS registries will collectively
recognize the benefits to them and to the community of consistent cross-registry
implementation of the IDNA protocol and, flowing from that, of collaborating
with each other to develop common, locally-accepted IDNA implementations.
In that way, local expertise (about, for example, a given language's
character equivalence problems and solutions) can be developed and shared
globally, to the benefit of all DNS registries, registrars, application
software engineers, and, ultimately, Internet users. Implementing registries
have great incentives to recognize that global adoption of IDNA will
be greatly enhanced if they seek to harmonize, for example, their approaches
to registry-level character encoding and variant problems. These incentives
are appropriately highlighted by publication by registries of the rules
they apply to IDNs.
At the same time, the premise of this paper is that it would be a mistake
for ICANN to pursue a burdensome and/or intrusive approach to IDN implementation
for example, by putting ICANN in the position of approving
a character-equivalence table for each language, and of maintaining
such tables. The deployment of IDNA within existing top-level domain
registries is fundamentally a registry responsibility, and the registries
will be in the best position to make appropriate implementation decisions
themselves, and should have the freedom to make adjustments as experience
dictates. Just as DNS registries embrace a wide diversity in registration
policies and administrative procedures, reflecting the diversity of
local Internet communities, it seems apparent that the vast diversity
of human character sets and the languages from which they come compels
a language-by-language, registry-led approach to the development of
detailed registration policies and administrative procedures.
Accordingly, this paper proposes an IDN deployment environment of relatively
few, vital requirements for implementing registries. This proposed approach
would provide a platform of technical compatibility and transparency
and promote a cooperative environment in which registries would consult
with the relevant user communities to establish registration procedures
that are widely adopted, predictable, and broadly acceptable to users.
This paper addresses the specific issue of authorizing the registration
of IDNA-compatible domain name strings in the DNS registries with agreements
with ICANN. It does not address other IDN policy and implementation
issues.
Standards
for ICANN Authorization of IDNA Registrations
As a condition to authorizing IDNA registrations, this paper proposes
that ICANN require the registries agree to four mandatory points, while
encouraging them to adhere to two others. The paper synthesizes the
many discussions and debates (and occasional brawls) about IDNA implementation
that have taken place online and in various fora in recent months. In
addition, the paper draws heavily from the work of ICANN's IDN Committee,
and the Internet-draft "IDN Registration and Administration Guideline
for Chinese, Japanese, and Korean," <http://www.ietf.org/internet-drafts/draft-jseng-idn-admin-02.txt>,
by J. Seng and J. Klensin, eds., and K. Konishi, K. Huang, H. Qian,
and Y. Ko ("CJK Guidelines").
The first four points below are proposed as mandatory requirements
that the registries would be required to agree to as the conditions
for ICANN authorization to begin accepting IDNA-compliant domain name
registrations. Points 5 and 6 are proposed as strong
recommendations to the gTLD registries (and, indeed, to all registries),
but would not be made mandatory.
1. Top-level domain registries that implement
internationalized domain name capabilities must do so only in strict
compliance with all applicable technical standards.
This point may seem obvious, but its significance merits a restatement
of its importance.
Registry compliance with technical standards is of absolutely critical
importance to the continued success of the domain name system. As
with the Internet protocol itself, the success of the DNS rests upon
the universal compatibility of its implementations at all points in
the network. Adherence to published standards has resulted in a DNS
that is reliable, scalable, resilient, predictable and globally authoritative.
Standards compliance by registries, registrars, application writers,
ISPs, etc., ensures that the DNS gives the same answer to a given
query no matter where the query originates, what kind of application
is generating the query, or what the nature of the identified host
or service is.
A DNS registry that does not comply with the published DNS technical
standards breaks the principle of universal compatibility, risks conflicts
and confusion, and jeopardizes its own utility to users.
For a new DNS protocol like IDNA, it is thus essential that implementing
registries take exceptional care to ensure comprehensive compliance
with the published specifications. The broad availability of IDNA
applications is vital to the deployment of IDNs and, to encourage
applications to be written, it is vital to provide those applications
an IDN registration system that operates strictly according to the
published technical standards.
2. In implementing IDNA, top-level domain registries
must employ an "inclusion-based" approach for identifying
permissible code points from among the full Unicode repertoire, and,
at the very least, must not include (a) line symbol-drawing characters,
(b) symbols and icons that are neither alphabetic nor ideographic
language characters, such as typographical and pictographic dingbats,
(c) punctuation characters, and (d) spacing characters.
The reasoning behind this point has been fully articulated by the
IDN Committee in its February 2002 "Input to the IETF on Permissible
Code Point Problems," posted at <http://www.icann.org/committees/idn/idn-codepoint-input.htm>,
and in the accompanying "Briefing Paper on Permissible Code Point
Problems," posted at <http://www.icann.org/committees/idn/idn-codepoint-paper.htm>.
(The paper explains "inclusion-based approach" this way:
"Start with the current restricted LDH ASCII characters (a-z,
A-Z, 0-9, -) and then extend [that table] to include relevant, non-problematical
'international' characters. Another way to state this model is: 'Everything
that is not explicitly permitted is prohibited.'") The IETF concluded
that this concern was best addressed at the registry level, through
registration policies and administrative procedures, rather than at
the protocol level.
The registries under contract will be expected to work together through
the IDN Registry Implementation Committee to reach a common definition
of the exact Unicode ranges described in sub-points (a), (b), (c),
and (d), above.
3. In implementing IDNA, top-level domain registries
must (a) associate each registered domain name with one or more languages,
(b) employ language-specific registration and administration rules
that are documented and publicly available, such as the reservation
of all domain names with equivalent character variants in the languages
associated with the registered domain name, and, (c) where the registration
and administration rules depend on a character variants table, allow
registrations in a particular language only when a character variants
table for that language is available.
For speakers of the Latin-alphabet-based languages, the easiest way
to understand point 3 is to consider the following list of domain
names:
example.com
Example.com
eXample.com
exaMple.com
examPle.com
exampLe.com
examplE.com
EXAMPLE.com
ExAMPLE.com
EXaMPLE.com
EXAmPLE.com
EXAMpLE.com
EXAMPlE.com
EXAMPLe.com
etc.
In the languages that utilize Latin characters (e.g., English, Finnish,
German, Italian, etc.), each letter has two variants: upper case and
lower case. The Internet's basic DNS and hostname specifications provide
that the upper-case and lower-case variants of each letter are considered
to be equivalent. Thus, all the variant domain names in the above
list are treated as the same domain name.
Other languages' character sets present similar problems of variants
and equivalence that is, when one character (or Unicode
code point) is considered to be the same as another character (or
Unicode code point). "Sameness" in this context is often
extremely complicated to define equivalence can be functional,
semantic, or visual and varies from language to language.
(For a detailed discussion of these issues in the context of the characters
used by the Chinese, Japanese, and Korean languages, see Section 1
of the CJK Guidelines
Internet-draft, above.) Accordingly, the global DNS registries
need to develop and implement a set of registration policies and administrative
procedures for each language prior to accepting IDN registrations
in that language's character set.
For nearly all languages, registry-level IDN policies will need to
incorporate a language-specific table of character variants and equivalences.
Some tables, such as the CJK table, are quite complex; others, such
as those for the Romance and Germanic languages (or other languages
that rely primarily on the Roman alphabet but also incorporate a limited
number of extra characters or accents), may be quite short and straightforward.
For example, one might imagine that the German language table will
address vowels with Umlauts and the ß (Eszet or "Scharfes
S"), perhaps specifying that, e.g., "ä" (a-umlaut)
is equivalent to"ae" (though perhaps not in all cases).
Indeed, a few languages such as Creole, Indonesian, Malay,
Swahili, Xhosa, and Zulu may not require any special
provision for character variants and equivalences (at least not for
their Latin-character orthographies). The point is that such determinations
will require careful study and close collaboration with local experts
and, where possible, relevant DNS registries.
The CJK Guidelines introduce the helpful concept of a IDN "package,"
meaning that when an IDN is registered, the registry applies the table
of characters variants, determines which variations are considered
equivalent, and groups them all together into a single unit (for all
purposes, including transfer and deletion).4
If more than one language is associated with the registered IDN, then
each associated language's ruleset must be applied, with each generating
additional character variants to be included in the package. The entire
package is registered or reserved to the registrant, meaning that
the registration of a single IDN also brings with it the registration
or reservation of any and all equivalents. Whether or not the equivalent
domain names are treated as live registrations (i.e., are included
in the relevant zone file) and whether or not additional registration
fees are charged for equivalents are matters for each registry to
resolve for itself, after consultation with linguistic experts and
others in the affected community.
(Smart registries are likely to pursue registrant-friendly practices
and policies, because the competitive global market in domain name
registrations means that registries that mishandle IDNA implementation
for example, by failing to treat equivalents as equivalents
will lose customers to other registries in the short
term, and will do long-lasting damage to their TLD brand. By way of
ASCII analogy, it doesn't take much imagination to recognize that
if the .foo registry began to allow different registrants to register
MICROSOFT.foo, microsoft.foo, microSOFt.foo, and so forth, both registrants
and Internet users would cease to regard the .foo TLD as a reliable
source of identifiers. The stakes in the implementation of IDNA are
just as great. User trust is at the heart of why it is so important
for registries to develop and apply appropriate, language-specific,
locally-legitimated registration policies for IDN registrations.)
As noted, to assure broad acceptance by Internet users, the creation
of language-specific policies and rules is essential and should be
done by the affected DNS registries in collaboration with local experts.
The creation of the CJK Guidelines document demonstrates that experts
and registries can work together to produce a carefully-considered,
well-documented, technically sensitive, and administratively implementable
set of rules and policies for a defined set of characters. The authors
of that document and the registries that supported them deserve much credit for their hard work, and for taking
a pioneering lead in creating practical solutions to the many complex
problems presented by the implementation of IDNA. Ideally, a guidelines
document should be developed and published for each language in which
a DNS registry wishes to accept IDNA registrations.
Once a registry develops registration and administration rules for
a particular language, it is important that these rules be fully documented
and made available online. By making the rules fully available, all
the affected parties will be able to work toward the mutually beneficial
goals of predictability, simplicity, and uniformity. While different
registry operators should be free, with reasons that are compelling
to them, to adopt different registration and administration rules,
they should not be forced into taking different approaches simply
by ignorance of what others are doing. It is in the mutual interest
of users, application developers, and registration authorities alike
to promote full transparency of registration and administration rules
followed by all registries for all languages.
4. Registries must commit to working collaboratively
through the IDN Registry Implementation Committee to develop character
variants tables and language-specific registration policies, with
the objective of achieving consistent approaches to IDN implementation
for the benefit of DNS users worldwide.
This requirement is intended to assure the Internet community that
the registries under contract with ICANN will continue working together
for the common good of the DNS together, we sincerely
hope, with those ccTLD registries working on IDN implementation. It
is extremely important that the introduction of IDNA preserve the
consistency, reliability, and inter-compatibility that characterize
the DNS. Registry-level collaboration in areas of common concern is
essential to accomplishing that goal. This collaboration requirement,
however, is not intended to constrain the freedom of registries to
choose which variants tables and registration policies to apply.
5. In implementing IDNA, top-level domain registries
should, at least initially, limit any given domain label (such as
a second-level domain name) to the characters associated with one
language only.
This principle is stated as a recommendation rather than a requirement.
At this early stage of IDNA implementation, it appears that massive
confusion and cybersquatting will result if a single domain name label
is allowed to include characters associated with different languages.
For example, the Roman, Greek, Cyrillic, and Armenian character sets
include numerous characters that appear to be identical but are separate
code points on the Unicode tables. Other languages, such as Turkish,
present similar problems.
Over time, after careful study and the accumulation of experience,
it may become clear that the limitation of single labels to single
languages' characters can be relaxed in some cases. But for now, at
the outset of IDNA implementation, it seems clear that potential risks
in terms of confusion and cybersquatting are enormous, and can easily
be avoided by application of this straightforward rule.
Note that this principle is written carefully to recognize that some
languages share characters with other languages, and that those shared
characters should not somehow be excluded. The "characters associated
with one language" may include characters used by other languages,
too.
6. Top-level domain registries (and registrars)
should provide customer service capabilities in all languages for
which they offer internationalized domain name registrations.
This principle, too, is written as a common-sense recommendation,
not a requirement. Customers speaking a language whose characters
are offered as IDNs will reasonably expect to get customer service
in their language. It is, of course, up to registries and registrars
to determine what languages they will support. But if a registry or
registrar offers an IDN service targeted at speakers of a particular
language, it would be prudent to have the ability to communicate in
that language with registrants.
Conclusion
Subject to feedback and comments from the ICANN community, it is proposed
to apply the foregoing standards to the authorization of IDN registrations
by registries with agreements with ICANN. The actual procedure would
be quick and straightforward: the registry would submit to ICANN an
agreed statement of its commitment to abide by the required principles
stated in the first four points above. ICANN would, in turn, provide
written authorization to the registry to begin accepting IDNA-compliant
IDN registrations.
The ongoing IDN implementation tasks of common concern to all registries
implementing IDNA such as the development of character
variant and equivalence tables in consultation with local experts and
affected registries must proceed, through the IDN Registry
Implementation Committee and the various local and regional bodies that
have taken it on. To assure the rapid introduction of IDNs in the major
languages' character sets, the development of language-specific rulesets
must proceed in parallel, in both formally-constituted and ad hoc groupings
of experts and registries (both gTLDs and ccTLDs).
Notes
1 The permitted subset of US-ASCII is often referred
to as the LDH code points, meaning the ASCII letters, digits, and embedded
hyphens.
2 The three standards-track IETF documents comprising
IDNA are:
- P. Fältström, P. Hoffman and A. Costello, "Internationalizing
Domain Names in Applications (IDNA)", RFC
3490
- P. Hoffman and M. Blanchet, "Nameprep: A Stringprep Profile
for Internationalized Domain Names", RFC
3491
- A. Costello, "Punycode: A Bootstring encoding of Unicode for
IDNA," RFC
3492
3 More precisely, each of those registry agreements
provides that the registry must reserve from initial registration "All
labels with hyphens in the third and fourth character positions"
(for example, xn--1k2n4h4b.org), except as expressly authorized by ICANN
in writing. See, for example, the .org registry agreement, Appendix
K: <http://www.icann.org/tlds/agreements/unsponsored/registry-agmt-appk-26apr01.htm>.
Under the recently-finalized IDNA protocol, all internationalized domain
names are converted into ASCII strings with hyphens in the third and
fourth character positions.
4 Consistent with the standard usage in the DNS technical
documentation, the CJK Guidelines document correctly distinguishes between
a "domain name" and a "label." A "label"
in a domain name refers to a single segment or zone (i.e, the string
of characters that come between two sequential dots in a fully qualified
domain name); a "domain name" is all of the segments or zones
joined together. Thus, in the domain name "www.aso.icann.org,"
the "icann" portion constitutes one label. Because the CJK
Guideline document is intended to be applied on a zone-by-zone basis,
one label at a time, it focuses on "internationalized domain labels"
rather than "internationalized domain names", and thus speaks
of "IDL packages," rather than "IDN packages."
This paper, intended for a less technically oriented audience, uses
"internationalized domain name" or "IDN" to mean
"a domain name that contains at least one internationalized domain
label."
Comments
concerning the layout, construction and functionality of this site
should be sent to webmaster@icann.org.
Page Updated
21-Mar-2003
©2003 The Internet Corporation for
Assigned Names and Numbers. All rights reserved. |