en

How the ICANN Community Helped Pave the Way for a More Multilingual Internet

22 March 2022
By Sarmad HussainSarmad Hussain

ICANN's mission is to help ensure a stable, secure, and unified global Internet. But what exactly is required to support a global Internet? Most of the world's population does not speak English as a first language or write their language using only the letters a-z. In fact, only slightly more than a third of the world's population uses the Latin script and an even smaller number uses only the letters a-z.

In an effort to make the Internet and Domain Name System (DNS) more accessible for its diverse and global users, the community has worked many years to introduce Internationalized Domain Names (IDNs). IDNs enable people around the world to use domain names in local languages and scripts, such as Arabic, Chinese, Cyrillic, Devanagari, Thai, and many more.

The process to enable a complete IDN, including top-level domain (TLD) labels, is an intricate and time-consuming one that ICANN helps facilitate. One part of the process is to develop a consistent and transparent mechanism to determine valid IDN TLD labels and their variant labels for the different scripts used by communities globally. Over the past eight years, various script communities have formed Generation Panels (GPs). They are made up of DNS and language and script experts who work together to develop the rules needed to form TLDs in their respective scripts in a stable and secure way. The rules are developed through the Root Zone Label Generation Rules (RZ-LGR) procedure, which is now being considered by the community as the mechanism to validate new generic top-level domains (gTLDs) in the next round and country code top-level domains (ccTLDs), and to define their variant labels.

Later this week, the fifth version of the Root Zone Label Generation Rules (RZ-LGR-5), which will involve the integration of a total of 26 scripts, will be open for Public Comment. These scripts are used to write hundreds of languages around the world. This is a remarkable achievement and a true testament to the multistakeholder model at work. ICANN org is proud to have supported the script communities in these efforts as the RZ-LGR is an important tool used to enable broader access to a global and multilingual Internet.

As we approach the publication of RZ-LGR-5, which is expected to include all of the work conducted by active GPs, I want to take the time to acknowledge and celebrate the dedication and accomplishments of their work. Since the first GP was formed in 2014, the community has:

  • Created 17 Generation Panels covering 26 scripts
  • Engaged 270-plus script community volunteers
  • Logged 10,000-plus volunteer hours
  • Finalized 26 unique scripts covered by LGR proposals (as of today): Arabic, Armenian, Bangla, Chinese (Han), Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Japanese (Hiragana, Katakana, and Kanji [Han]), Kannada, Khmer, Korean (Hangul and Hanja [Han]), Lao, Latin, Malayalam, Myanmar, Oriya, Sinhala, Tamil, Telugu, and Thai.

GP Fun Facts:

  • Each GP is typically made up of 7 to 15 volunteer members but membership can be much larger based on how a GP is organized.
  • The largest GP is the Neo-Brahmi Script Generation Panel which has 66 members from Bangladesh, India, Nepal, Sri Lanka, and Singapore covering 9 scripts.
  • The first GP, the Arabic Script GP, was established in 2014.
  • The total number of languages supported by GP work: 386-plus
  • The total number of countries represented by GP members: 44
  • The number of ICANN Public Comment proceedings to develop RZ-LGR so far: 30-plus

ICANN org will continue to support other script communities to form GPs, based on the RZ-LGR procedure.

So what exactly do GPs do and why does their work matter? Find out more below.

What do Generation Panels do?

What's so complicated about enabling different scripts in IDNs? Due to the different nature of scripts and writing systems used around the world, some scripts require contextual rules to form a label without rendering issues. There is also the possibility of end user confusion when it comes to characters that can be perceived as the same by the script user but are actually different, which are called variants (e.g., "a" Latin lower-case letter A (U+0061) and "а" Cyrillic lower-case letter A (U+0430)). In some scripts, variant labels are needed to promote the usability of IDNs, like in Simplified Chinese and Traditional Chinese, which need to be allocatable.

GPs are tasked with developing the sets of rules regarding repertoire, variant code points, and label formation. GPs are made up of volunteers that include script community representatives with deep understanding of local culture, customs, and practices, and linguistic specialists. They also include people familiar with registry and registrar operations, as well as policy and DNS experts. For each GP, the main objectives are to:

  • Shortlist the characters that can be used in a domain name for a specific script.
  • Identify code points that need to be considered the "same" or variants in order to reduce end-user confusion and to support usability.
  • Define script-specific rules to prevent security issues.

Here is an example of a security issue addressed by a GP:

Case Code Points Glyph*
1 U+0067 U+0303 U+0303 g̃
2 U+0067 U+0303 g̃

*As shown on the Chrome browser address bar. Chrome Version 97.0.4692.71 (Official Build) (x86_64) for MAC.

In this example, Case 1 and Case 2 have different code point sequences, but they render the same visually. Case 1 has a double combining tilde (U+0303 U+0303) while Case 2 has only one (U+0303). The Latin script RZ-LGR proposal that handled this issue, does not include U+0303 as a single code point, therefore Case 1 cannot be formed due to the potential security issue.

How long does this work take?

It can take up to a few years for a GP to finalize a script proposal. Once a script proposal is complete, it goes through the Public Comment process and is then reviewed by an Integration Panel made up of experts in linguistics, Unicode, DNS, and IDNs. After review, the Integration Panel incrementally integrates the script proposal into the RZ-LGR. The updated RZ-LGR is again released for Public Comment to verify the integration process before final publication.

Why does this work matter?

The DNS, and the root zone in particular, is a shared global resource. To build a more inclusive and secure multilingual Internet – one that can work for people around the world – the support of scripts, and ultimately IDNs, needs to be carried out in a careful and conservative manner. Without the combined knowledge and dedication of all the GPs and Integration Panel members over the past eight years, a multilingual Internet would not be possible. GPs have laid a foundation for the proper use of scripts that balances the usability and security of domain names in different scripts.

On behalf of ICANN org, I want to once again thank all of the community members and participants involved in GPs from around the world who have helped in these efforts.

Authors

Sarmad Hussain

Sarmad Hussain

Sr Director IDN & UA Programs
Read biographyRead biography