Skip to main content

Collaborating towards a Truly Multilingual Internet

We use a variety of scripts and writing systems across the globe. To pave the way for the next billion people to come online, we need to cater to and allow for access to a multilingual Internet.

In the context of achieving complete domain names in local languages, different scripts need to be effectively supported. To contribute to this, ICANN has undertaken a significant effort to determine valid top level domains, like .org, .asia, .ca, .sg, and their variant labels, in local languages.

My earlier blog called for participation from the communities using different scripts and writing systems to get engaged with ICANN to define label generation rules for their script. These rules will provide guidance for local scripts and writing systems, while maintaining the security and stability aspects of the system.

We received very enthusiastic responses from many communities. They have formalized or are forming groups working to develop these rules for their respective scripts and writing systems. These include: Arabic, Armenian, Bengali, Chinese, Cyrillic, Devanagari, Gujarati, Gurmukhi, Japanese, Kannada, Khmer, Korean, Malayalam, Oriya, Tamil, Telugu and Tibetan. These communities have been meeting both face-to-face and using online conferencing tools for their discussions and work.

The work of these groups will pave the way for a truly multilingual Internet in the near future.

In addition to the work within these specific script communities, there is also a need for some communities to collaborate with each other. It has been very heartening to see many different models of such collaboration emerging.

Chinese, Japanese and Korean Community Panels

A primary example has been the coordination and collaboration between the Chinese, Japanese and Korean communities, which share the Han script. Soon after these communities were able to organized, the  coordination began.  It covers two aspects:

First, after each community has identified the characters from Han script they want to use, they need to determine how many of these characters are commonly used by other communities. Currently 6000 such characters have been identified in the context of the top-level domains, and the discussions have not concluded.

The second step is to figure out how variants of these characters will be determined. This is especially challenging because certain characters that look exactly the same have different usage or meaning across the communities. For example, 机 means 'machine' in Chinese but it means 'desk' in Japanese and Korean; however, 機 (the traditional form of the same character) means 'machine' for all the three communities.

These community-based panels need to keep working together to identify and overcome these challenges. It is no simple feat, but resolving these issues will help deploy a secure and stable top-level domain name system.

The representatives of these communities have been meeting face-to-face at ICANN meetings to define the scope of the challenge and determine its solutions, and making significant progress. They recently shared their progress with the larger community at the Asia Pacific Regional Internet Governance Forum (APrIGF) in Macau.

Chinese, Japanese and Korean panel representatives at APrIGF

Neo-Brahmi Community Panel

The newly formed community-based panel for Neo-Brahmi (NB) scripts is handling a different challenge.

While Chinese, Japanese and Korean communities have to coordinate the single Han script across the multiple panels, the Neo-Brahmi community has to coordinate multiple scripts within a single panel.

The community decided on this approach because these scripts share many of the same traits that are best handled uniformly. For example, the characteristics of an Akshara, as explained by Abhijit Dutta, a member of the Neo-Brahmi Panel: "The alphabet of an NB script has vowels, consonants and special modifiers. An NB script has a "character" as its main standalone constituent. This character, unlike the "English" character, can be a combination of many different elements from the alphabet, numbering from 1 to potentially 9 or 10. Each character is independent and its distinguishing feature is that it can be pronounced in fullness and exist as a complete visual and phonetic unit."

Akshat Joshi from Centre for Development of Advanced Computing

The Neo-Brahmi script panel recently met in Pune, India, to chart out their challenges and how to best address them.

Neo-Brahmi Script Panel Meeting in Pune

Similar collaborative efforts already have been identified by the Lao script panel, as they may need to work with Thai and possibly Khmer script communities. There is an equally challenging need for cross-panel collaboration among Armenian, Cyrillic, Greek and Latin scripts. The Armenian script panel has already published a comparative analysis as part of their proposal and posted it for public comment.

Asia Pacific is home to many complex scripts and writing systems. Therefore,  equally substantial contribution and collaboration is needed to ensure that these scripts are fastidiously and securely integrated to support top-level domains. A first step is to raise awareness of this ongoing work, followed by facilitating this coordination among script communities. ICANN looks forward to the exciting partnerships being developed within and across communities in our region to enable a truly multilingual Internet.

Jia-Rong Low and Samiran Gupta also contributed to this report. Sarmad Hussain is the Senior Manager of ICANN IDN Program. Jia-Rong is the Senior Director of Strategies and Initiatives for ICANN APAC and Samiran is the Head of India for ICANN APAC.

Comments

    Domain Name System
    Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."