Skip to main content
Resources

Maximal Starting Repertoire

For Generation Panels to start their work the Integration Panel has created the maximal set of code points for the root zone under the Procedure to Develop and Maintain Label Generation Rules (LGR) for the Root Zonein Respect to IDN Labels [PDF, 772 KB], called the Maximal Starting Repertoire (MSR). MSR may be updated by the Integration Panel, based on feedback from the community and to accommodate relevant updates in the Unicode standard. For sending feedback on the latest version of MSR, send email to IDNProgram@icann.org.

MSR-5 is the current version of the MSR covering 28 scripts: Arabic, Armenian, Bengali, Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Khmer, Lao, Latin, Malayalam, Myanmar, Oriya, Sinhala, Tamil, Telugu, Thaana, Tibetan and Thai. MSR-5 shortlists 33,515 code points shorted from the Unicode version 11.0.

Earlier Versions of Maximal Starting Repertoire

MSR- 4 was released on 7 February 2019, covering 28 scripts, already included in MSR- 3: Arabic, Armenian, Bengali, Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Khmer, Lao, Latin, Malayalam, Myanmar, Oriya, Sinhala, Tamil, Telugu, Thaana, Tibetan and Thai. MSR- 4 contains 33,511 code points short-listed from 97,973 PVALID/CONTEXT code points of Unicode version 6.3.

MSR-3 was released on 29 March 2018, covering the following 28 scripts, already included in MSR-2: Arabic, Armenian, Bengali, Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Khmer, Lao, Latin, Malayalam, Myanmar, Oriya, Sinhala, Tamil, Telugu, Thaana, Tibetan and Thai. MSR-3 contains 33,496 code points short-listed from 97,973 PVALID/CONTEXT code points of Unicode version 6.3.

MSR-2 was released on 27 April 2015, covering the following 28 scripts: Arabic, Armenian, Bengali, Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Khmer, Lao, Latin, Malayalam, Myanmar, Oriya, Sinhala, Tamil, Telugu, Thaana, Tibetan and Thai. MSR-2 contains 33,490 code points short-listed from 97,973 PVALID/CONTEXT code points of Unicode version 6.3.

MSR-2 release consists of the following documents:

MSR-1 was released on 20 June 2014, covering the following 22 scripts: Arabic, Bengali, Cyrillic, Devanagari, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Lao, Latin, Malayalam, Oriya, Sinhala, Tamil, Telugu, and Thai. MSR-1 contains 32,790 code points short-listed from 97,973 PVALID/CONTEXT code points of Unicode version 6.3.

MSR-1 release consists of the following documents:

Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."