Skip to main content
Resources

IDN Glossary

A | C | D | G | I | L | M | P | R | S | T | U | V


A

A-label

The ASCII-compatible encoded (ACE) representation of an internationalized domain name, i.e. how it is transmitted internally within the DNS protocol. A-labels always commence with the prefix "xn--". Contrast with U-label.

ACE (ASCII Compatible Encoding)

ACE is a system for encoding Unicode so each character can be transmitted using only a limited set of ASCII characters (i.e. a-z, 0-9 and "-"). This is used because applications that use the DNS protocol may not reliably handle other values.

ASCII (American Standard Code for Information Interchange)

ASCII is a common numerical code for computers and other devices that work with text. Computers can only understand numbers, so an ASCII code is the numerical representation of a character such as 'a' or '@'. When mentioned in relation to domain names or strings, ASCII refers to the fact that before internationalization only the letters a-z, digits 0-9, and the hyphen "-", were allowed in domain names.

C

Character

For the purposes of discussing IDNs, a "character" can best be seen as the basic graphic unit of a writing system, which is a script plus a set of rules determining how it is used for representing a specific language. However, domain labels do not convey any intrinsic information about the language with which they are intended to be associated, although they do reveal the script on which they are based. This language dependency can unfortunately not be eliminated by restricting the definition to script because in several cases (see examples below) languages that share the same script differ in the way they regard its individual elements. The term character can therefore not be defined independently of the context in which it is used.

In phonetically based writing systems, a character is typically a letter or represents a syllable, and in ideographic systems (or alternatively, pictographic or logographic systems) a character may represent a concept or word.

The following examples are intended to illustrate that the definition of a character is at least two-fold, one being a linguistic base unit and the other is the associated code point.

U-label 酒 : Jiu; the Chinese word for 'alcoholic beverage'; Unicode code point is U+9152 (also referred to as: CJK UNIFIED IDEOGRAPH-9152); A-label is xn--jj4

U-label 北京 : the Chinese word for "Beijing", Unicode codepoints are U+5300 U+4EAC; A-label is xn--1lq90i

U-label 東京 : Japanese word for "Tokyo", the Unicode code points are U+6771 U+4EAC; A-label is xn--1lqs71d

U-label ایكوم; Farsi acronym for ICOM, Unicode code points are U+0627 U+06CC U+0643 U+0648 U+0645; A-label is xn--mgb0dgl27d.

U-label بحر; the Arabic word for "Sea", Unicode code points are U+0628 U+062D U+0631; A-label is xn--ngbkm.

Country-code Name Supporting Organisation (ccNSO)

A component of ICANN's policy development forums (a "constituency") that is responsible for discussing and developing policy relating to how ccTLDs are delegated.

D

Domain Name Label

A constituent part of a domain name. The labels of domain names are connected by dots. For example, "www.iana.org" contains three labels — "www", "iana" and "org". For internationalized domain names, the labels may be referred to as A-labels and U-labels.

I

IDN (Internationalized Domain Name)

IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet "a-z". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European "0-9". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed "ASCII characters" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of "Unicode characters" that provides the basis for IDNs.

The "hostname rule" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen "-". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS.

The following terminology is used when distinguishing between these forms:

A domain name consists of a series of "labels" (separated by "dots"). The ASCII form of an IDN label is termed an "A-label". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a "U-label". The difference may be illustrated with the Hindi word for "test" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of "ASCII compatible encoding" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di.

A domain name that only includes ASCII letters, digits, and hyphens is termed an "LDH label". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as "icann.org" is not an IDN.

IDN Practices Repository

A repository on IANA's website where top-level domain registries contribute the IDN tables they use.

IDN SLDs or IDN 2LDs

Usually a reference for domain names with local characters at the second level, while the top level remains in ASCII-only characters. For example: [παράδειγμα .test] ("example.test" in Greek).

IDN Table

An IDN Table is a table listing all those characters that a particular TLD registry supports. If one or more of these characters are considered a variant this is indicated next to that/those characters. It is also indicated which character a particular character is a variant to. The IDN Tables usually hold characters representing a specific language, or they can be characters from a specific script. Therefore the IDN Table is sometimes referred to as a language-based IDN Table or script-based IDN Table.

IDN TLDs

Usually the short reference for internationalized top-level domains, thus allowing the entire domain name to be represented by local characters. For example: [실례.테스트] ("example.test" in Hangul).

IDNA (Internationalized Domain Names in Application)

IDNA is a protocol defined in RFCs 5890, 5891, 5892, 5893, 5895, 5895 and their relevant RFCs by the Internet Engineering Task Force (http://www.ietf.org). IDNA makes it possible for applications to handle domain names with non-ASCII characters, including converting domain name strings with non-ASCII characters to ASCII domain name labels, that applications that use the DNS can accurately understand. Not all characters used in the world's languages will be available for use in domain names. Hence IDNA is not able to convert all such characters into ASCII labels.

L

Label

A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name "example.com" is composed of two labels: "example", and "com".

Languages | Scripts | Alphabets

Languages are used by speech communities. Scripts are used to write down information in the various languages and this is done by using the corresponding alphabets or alternative writing systems.

LDH (Letter, Digit, Hyphen)

The hostname convention defined in RFC 952 (later modified by RFC 1123) was used by top-level domain Registries before internationalization. This meant that domain names could only practically contain the letters a-z, digits 0-9 and the hyphen "-". The term "LDH code points" refers to this subset. With the introduction of IDNs this rule is no longer relevant for all domain names although with the use of IDNA, what appears in the DNS remains LDH.

Local Internet Community

The community of Internet users within a country who benefit from the country's top-level domain. Country-code top-level domains are delegated to sponsoring organizations to operate domains in the best interests of this community, particularly by implementing policies the community has developed.

P

Punycode

Punycode is the LDH-compatible encoding algorithm described in Internet standard [RFC3492], and in use today. This is the method that is used to encode IDNs into sequences of LDH ASCII characters in order for applications using the Domain Name System (DNS) to understand and manage the names. The intention is that domain name registrants and users will never see this encoded form of a domain name. The sole purpose is for the DNS to be able to resolve for example a URL containing local characters. For examples see A-label under "IDN".

The prefix in a Punycode A-label is always "xn--". Hence this prefix is recommended to be reserved by top-level domain Registries in order to avoid confusion when/if registrations of IDNs are introduced under the respective top level domain.

S

Script

A script is a collection of symbols used for writing a language. There are three basic kinds of script. One is the alphabetic (e.g. Arabic, Cyrillic, Latin) and its individual elements are termed "letters". A second is ideographic (e.g. Chinese), the elements of which are "ideographs". The third is termed a syllabary (e.g. Hangul) and its individual elements represent syllables. The writing systems of most languages use only one script but there are exceptions such as Japanese that uses four different scripts, representing all three of the categories listed here.

In order to be used in the computing environment, each element of a script needs to be numerically encoded. A collection of symbols numbered in this fashion is called a "character set". A character set may include more than one script (e.g. the "Universal Character Set", popularly known as Unicode), or it may be restricted to a single script (e.g. US-ASCII, which to be correct does not even cover the entire Latin script). A rigorous distinction must be made between scripts and character sets.

The only character set relevant to IDNA is Unicode. This assigns a numerical "code point" and a "character name" to every element of every script. The script-based policies that ICANN attaches to IDNs will operate on the names of the scripts that appear in Unicode character names, or on the blocks in the Unicode Code Chart that are similarly headed with script names. These script names are apparent at http://www.unicode.org/charts/.

For the purpose of the Fast Track Process, requesters must provide information about which script the strings in their request is represented in. From a practical standpoint the drop-down menu available for requesters, and hence facilitated in the Fast Track Online Request System is based on the ISO15924 list. From an evaluation standpoint, the validation of script and languages is defined in the Section 3.2 to the Fast Track Final Implementation Plan, as various methods for the requesters to select from. See http://icann.org/en/resources/idn/fast-track

It is important to note that characters in scripts which do not appear in the Unicode Code Chart are completely unavailable for inclusion in IDNs.

T

The Unicode Consortium

A not-for-profit organization founded to develop, extend and promote use of the Unicode standard. For more information, please visit http://www.unicode.org.

U

U-label

The Unicode representation of an internationalized domain name, i.e. how it is shown to the end-user. Contrast with A-label.

Unicode

Unicode is a commonly used single encoding scheme that provides a unique number for each character across a wide variety of languages and scripts. The Unicode standard contains tables that list the "code points" (unique numbers) for each local character identified. These tables continue to expand as more and more characters are digitalized.

In Unicode, characters are assigned codes that uniquely define every character in many of the scripts in the world. These "code points" are unique numbers for a character or some character aspect such as an accent mark or ligature. Unicode supports more than a million code points, which are written with a "U" followed by a plus sign and the unique number in hexadecimal notation; for example, the word "Hello" is written U+0048 U+0065 U+006C U+006C U+006F.

UTF-8

UTF-8 -bit Unicode Transformation Format is a system for encoding Unicode so each character can be transmitted using 8-bit numerical values. This is commonly used as 8-bit data transmission is prevalent on the Internet.

V

Variant Label

In the context of internationalized domain names, an alternative domain name that can be perceived as the same thing. Depending on registry policy, variants may be registered to the same registrant or to be blocked. For example, 名称 (Míngchēng, means "Name") and 名稱 (Míngchēng, means "Name") may be considered variants in Chinese.

Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."