Skip to main content

根区标签生成规则制定最大启动字汇第 2 版 (MSR-2)

为了支持根区中的 IDN 标签,ICANN 社群在董事会的指示下,启动了多个项目研究这些标签的可行性和授权问题,并提出相关建议。在流程实施过程中,ICANN 非常高兴地宣布整合专家组现已发布其第二版 最大启动字汇(MSR-2)。这一向上兼容 MSR-1 的版本对字汇添加了六种文字。MSR 是"根区中 IDN 标签的标签生成规则 (LGR) 的制定和维护规程"[PDF, 772 KB](简称"本规程")下的首个交付项目,并是由社群组建的标签生成专家组制定各自 LGR 提案的基础。根区的 LGR 是关于根区 IDN 标签的制定和维护规则的一项机制。

MSR-2 囊括了以下 28 种文字,其中(带有*号)的 6 种文字为新进添加的内容:阿拉伯文、亚美尼亚文*、孟加拉文、西里尔文、梵文、埃塞俄比亚文*、格鲁吉亚文、希腊文、古吉拉特文、果鲁穆奇文、汉文、朝鲜文、希伯来文、日语平假名、埃纳德文、日语片假名、高棉语*、老挝语、拉丁文、马来亚拉姆文、缅甸文*、奥里雅语、僧伽罗文、泰米尔文、泰卢固文、马尔代夫语*、藏文*和泰国语MSR-2 包含一份拥有 33,490 个码点的简短清单,源于统一域名编码 (Unicode) 第 6.3 版中列出的 97,973 个有效/语境码点。

MSR-2 的发布为生成专家组搭建了工作的平台。除了从 MSR 中挑选指令而制定 LGR 提案以外,生成小组还将审核这些码点是否为变体,是否需要制定其他规则,进一步限制使用这类码点而生成的标签。生成小组最终确定的 LGR 提案将在根区 LGR 整合小组审阅之前公开发布以征询公众意见。如有再次发布 LGR 的必要,例如,当并非所有生成小组均可同时递交提案时,则可能导致发布 LGR 的后续版本。

鉴于用于统一域名编码 (Unicode) 7.0 的 IDNA 2008 官方表格尚未出炉,因此,MSR-2 延迟使用了某些已经编入 Unicode 7.0 的码点。Unicode 8.0 预计将于 2015 年发布,并将继续添加可符合根区使用资格的码点。此外,整合专家组还负责监督任何尚未列入 MSR 的文字,关注该文字的状态修改批准指示。此后,一旦获得批准在 MSR 内添加额外字汇,我们将编制 MSR 的后续版本。但在新版 MSR 发布前,MSR-2 将作为任何 LGR 编制的基础。MSR 的所有后续版本和 LGR 的所有版本必须维持完全向后兼容性。

MSR-2 发布包含以下文件:

More Announcements
Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as"""" is not an IDN."