Skip to main content

Maximal Starting Repertoire Version 2 (MSR-2) for the Development of Label Generation Rules for the Root Zone

To support IDN labels in the root zone, the ICANN community, at the direction of the Board, undertook several projects to study and make recommendations on their viability and delegation. In the context of the implementation of the procedure, ICANN is pleased to announce that the Integration Panel has released the second version of the Maximal Starting Repertoire (MSR-2). This upwardly compatible version of the MSR-1 adds six additional scripts to the repertoire. The MSR is the first deliverable under the Procedure to Develop and Maintain Label Generation Rules (LGR) for the Root Zone in Respect to IDN Labels [PDF, 772 KB] (the Procedure) and the starting point for the work by community based Generation Panels to develop their LGR proposals. The LGR for the Root Zone is a mechanism for creating and maintaining rules with respect to IDN labels for the root.

The MSR-2 covers the following 28 scripts, of which six (marked with *) have now been added to MSR: Arabic, Armenian*, Bengali, Cyrillic, Devanagari, Ethiopic*, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Khmer*, Lao, Latin, Malayalam, Myanmar*, Oriya, Sinhala, Tamil, Telugu, Thaana*, Tibetan* and Thai. MSR-2 contains 33,490 code points short-listed from 97,973 PVALID/CONTEXT code points of Unicode version 6.3.

This release of MSR-2 sets the stage for the work by Generation Panels. In addition to selecting their repertoire from within the MSR for developing LGR proposals, Generation Panels will also evaluate whether any such code points are variants and if any rules are needed to further constrain the labels generated using these code points. The resulting LGR proposals by the Generation Panels will be released for public comment before they are reviewed by the Integration Panel for integration into the Root Zone LGR. If it becomes necessary to stage the release of the LGR, for example because not all Generation Panels are able to submit proposals at the same time, subsequent versions of the LGR may be released.

MSR-2 defers some code points that are already encoded in Unicode 7.0, because authoritative tables for IDNA 2008 are not yet available for Unicode 7.0. Unicode 8.0, due in 2015, is expected to further add code points that are potentially eligible for the Root Zone. In addition, the Integration Panel monitors any scripts not included in the MSR for indications that change in status is warranted. At a later stage, another version of the MSR will be developed assuming that additional repertoire exists for which inclusion in the MSR is warranted. Until such a later version of the MSR is developed, MSR-2 would be the foundation for any LGR versions developed after its release. All future versions of the MSR and all versions of the LGR must retain full backwards compatibility.

MSR-2 release consists of the following documents:

More Announcements
Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as"""" is not an IDN."