Skip to main content

The South Asian Eleven - Progress on Supporting IDNs in Scripts from the Region

We recently traveled to South Asia to initiate and support community-driven efforts for developing the Internet's Root Zone Label Generation Rules (RZ-LGR), which will further enable a multilingual internet. This included meetings for Neo-Brahmi, Sinhala and Thaana Generation Panels.

Thaana Script

We started our series in the Maldives, where we were hosted by the Communications Authority of Maldives (CAM) for a two-day engagement on the domain name system and cybersecurity. Next on the agenda was a day of training on Internationalized Domain Names (IDNs), as well as the purpose and design of RZ-LGR. This training was attended by members of Computer Society of Maldives, the Dhivehi Academy and local experts working on Thaana – the Maldivian script -software.

The South Asian Eleven - Progress on Supporting IDNs in Scripts from the Region

Participants explained how the Thaana script is fairly regular in its use of consonants and vowel marks to write Dhivehi [PDF, 154 KB] language in a right-to-left direction. Interestingly, we also learned that the community mixes scripts. For example, they use the Thaana and Arabic scripts to write the name Abdullah [PDF, 221 KB]. From our meetings, there is now interest from the community in forming a Generation Panel (GP) for Thaana script to develop its proposal for RZ-LGR.

Sinhala Script

The Maldives meetings were productive and proved to be a great warm-up or net-practice for the five-day engagement in Colombo, Sri Lanka, where we supported discussions on development and review of multiple script based proposals. Over the weekend, we prepared to bat with the Sinhala script team members, many of whom have long been involved with Sinhala Unicode standardization and second-level domain names.

The South Asian Eleven - Progress on Supporting IDNs in Scripts from the Region

We were hosted by Theekshana, a non-profit company associated to University of Colombo School of Computing (UCSC), with our meeting room overlooking the beautiful cricket grounds on UCSC campus.

The Sinhala (සිංහල) script used to be written on palm leaves and is drawn in curves because straight lines could tear the leaves. It is used to write Sanskrit and Pali texts. During the meetings, we explained the constraints imposed by the procedure to develop the RZ-LGR and discussed the implications on the Sinhala script. The panel was also briefed on label-level rules being developed for Devanagari script by the Neo-Brahmi Generation Panel (NBGP) to organize the Akshara constraints, as an example for organizing Sinhala characters. Finally, the panel reviewed other scripts and found possible cross-script variant character cases with Kannada, Malayalam, Myanmar and Telegu scripts, which it intends to investigate further. The meeting ended with a media briefing where the Sinhala GP chair, Sri Lanka's representative on the Governmental Advisory Committee (GAC), and ICANN org talked about the importance of IDNs and announced the formation of the Sinhala GP.

Neo-Brahmi Scripts

The Neo-Brahmi Generation Panel (NBGP) panel was also convening in Colombo for their face-to-face meeting. The Neo-Brahmi and Sinhala GP teams started with a joint session, with members from India, Nepal, and Sri Lanka (we hope Bangladesh will join soon), focused on matters of mutual interest. These included the consistent framework for label-level rules for South Asian scripts, cross-script variants between Sinhala script and the scripts covered by NBGP, and the proposal for the Tamil script being developed by the NBGP.

The South Asian Eleven - Progress on Supporting IDNs in Scripts from the Region The NBGP had set an aggressive goal to hit at its meeting. After discussing Tamil, the next meeting featured a formidable line-up of experts to deliberate on Gujarati, Gurmukhi, Kannada, Malayalam, and Telugu scripts.

Work on these scripts has progressed significantly. The first complete drafts were reviewed by the panel, which identified challenges and possible ways forward. Experts working on the Kannada script identified characters which haven't been used in nearly 50 years and eventually excluded them from the RZ-LGR proposal.

The panel had a detailed discussion on cross-script variant characters to determine an agreeable mechanism to identify them – should variant characters be strictly homoglyphs, or should they also include other confusing characters? The panel converged on a solution to tag each pair of candidate cross-script variant characters with one of the three colors The South Asian Eleven - Progress on Supporting IDNs in Scripts from the Region, based on independent feedback from members from both relevant scripts, for indistinguishable, similar and distinct characters. The GP identified multiple script pairs including Devanagari-Gurmukhi (देवनागरी - ਗੁਰਮੁਖੀ), Kannada-Telugu (ಅಕ್ಷರಮಾಲೆ - తెలుగు లిపి) and Tamil-Malayalam (தமிழ் - മലയാളലിപി) for the cross-script variant analysis.

Looking Back

The South Asian Eleven - Progress on Supporting IDNs in Scripts from the Region We concluded our trip with a visit to the Cricket Club Café, a popular eatery in Colombo. We were surrounded by nostalgic cricket memorabilia including a bat with signatures of the Sri Lankan eleven who won the 1996 Cricket World Cup. Although we were as tired as we would be after playing a five-day cricket match, we were also enthused by the immense energy the community members injected in the invigorating discussions. We were driven by their passion to speak up for their languages and scripts. As we perused the menu, we were silently mindful of the line-up of the eleven scripts covered by the three panels we had been supporting in this trip. These scripts also exhibited a glorious variety in their style, simultaneously arraying a unifying South Asian Abugida writing culture at a deeper level.


    solar product  02:56 UTC on 20 February 2018

    nice post

    Mathew Jagger  18:22 UTC on 22 February 2018

    great improvement, we need to promote more efforts like this via social media.

    Badan Pengawasan Keadian  19:49 UTC on 25 February 2018


Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as"""" is not an IDN."