Skip to main content

Maximal Starting Repertoire Version 1 (MSR-1)‬ for the Development of Label Generation Rules for the Root Zone

To support IDN labels in the root zone, the ICANN community, at the direction of the Board, undertook several projects to study and make recommendations on their viability and delegation. One of these projects is the implementation of the Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels [PDF, 1.39 MB] (the Procedure) allowing for the development of Label Generation Rules (LGR) for the Root Zone. The LGR for the Root Zone is a mechanism for creating and maintaining rules with respect to IDN labels for the root.

In the context of the implementation of the Procedure, ICANN is pleased to announce that the Integration Panel has released the first version of the Maximal Starting Repertoire (MSR-1). The MSR-1 is the first deliverable from the Integration Panel under the Procedure and will serve as a fixed collection of code points from which Generation Panels may make a selection in constructing the repertoire for their respective LGR proposals.

The MSR-1 covers the following 22 scripts: Arabic, Bengali, Cyrillic, Devanagari, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Lao, Latin, Malayalam, Oriya, Sinhala, Tamil, Telugu, and Thai. MSR-1 contains 32,790 code points short-listed from 97,973 PVALID/CONTEXT code points of Unicode version 6.3.

This release of MSR-1 sets the stage for the work by Generation Panels. In addition to selecting their repertoire from within the MSR for developing LGR proposals, Generation Panels will also evaluate whether any such code points are variants and if any rules are needed to further constrain the labels generated using these code points. The resulting LGR proposals by the Generation Panels will be released for public comment before they are reviewed by the Integration Panel for integration into the Root Zone LGR. If it becomes necessary to stage the release of the LGR, for example because not all Generation Panels are able to submit proposals at the same time, subsequent versions of the LGR may be released.

MSR-1 defers some of the eligible scripts, so as to balance timeliness with comprehensiveness. At a later stage, another version of the MSR will be developed to include repertoires from the deferred scripts and make other additions to the repertoire as warranted. It would be the foundation for any subsequent LGR versions. All future versions of the MSR and all versions of the LGR must retain full backwards compatibility.

MSR-1 release consists of the following documents:

More Announcements
Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as"""" is not an IDN."