Skip to main content

String Similarity: IDN Variant Review Results

In an announcement on 26 February 2013, ICANN published a list of string similarity contention sets wherein two or more applied-for gTLD strings are identical or so nearly resemble one another visually that they are likely to cause confusion. In ICANN's New gTLD program, string similarity checks on Internationalized Domain Name (IDN) strings go beyond visual resemblance. The string similarity panel has now completed a review of potential IDN variant characters. Based on this review, two (2) contention sets of potential IDN variant strings have been identified.

U-Label Unicode Code Points A-Label Application ID
盛贸饭店
盛貿飯店
U+76DB, U+8D38, U+996D, U+5E97
U+76DB, U+8CBF, U+98EF, U+5E97
xn--hxt035czzpffl
xn--hxt035cmppuel
1-940-43388
1-940-75591
点看
點看
U+70B9, U+770B
U+9EDE, U+770B
xn--3pxu8k
xn--c1yn36f
1-1254-85868
1-1254-86222

As per section 1.3.3 of the Applicant Guidebook "Multiple applicants apply for strings that are identified by ICANN as variants of one another. These applications will be placed in a contention set and will follow the contention resolution procedures in Module 4." For more information and the procedures used to perform the IDN variant review see the IDN Variant TLDs - Integrated Issues Report, Appendix 5 at http://www.icann.org/en/topics/idn/idn-vip-integrated-issues-final-clean-20feb12-en.pdf [PDF, 2.1 MB].

The IDN Variant TLD Program is tasked to develop solutions and define necessary processes that must be in place to enable future delegation of IDN variant TLDs. More information about this program can be found at: http://www.icann.org/en/resources/idn/variant-tlds

View all String Contention Sets: PDF [167 KB], CSV [67 KB]


More Announcements
Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."