The South Asian Eleven - Progress on Supporting IDNs in Scripts from the Region
We recently traveled to South Asia to initiate and support community-driven efforts for developing the Internet's Root Zone Label Generation Rules (RZ-LGR), which will further enable a multilingual internet. This included meetings for Neo-Brahmi, Sinhala and Thaana Generation Panels.
We started our series in the Maldives, where we were hosted by the Communications Authority of Maldives (CAM) for a two-day engagement on the domain name system and cybersecurity. Next on the agenda was a day of training on Internationalized Domain Names (IDNs), as well as the purpose and design of RZ-LGR. This training was attended by members of Computer Society of Maldives, the Dhivehi Academy and local experts working on Thaana – the Maldivian script -software.
Participants explained how the Thaana script is fairly regular in its use of consonants and vowel marks to write Dhivehi [PDF, 154 KB] language in a right-to-left direction. Interestingly, we also learned that the community mixes scripts. For example, they use the Thaana and Arabic scripts to write the name Abdullah [PDF, 221 KB]. From our meetings, there is now interest from the community in forming a Generation Panel (GP) for Thaana script to develop its proposal for RZ-LGR.
The Maldives meetings were productive and proved to be a great warm-up or net-practice for the five-day engagement in Colombo, Sri Lanka, where we supported discussions on development and review of multiple script based proposals. Over the weekend, we prepared to bat with the Sinhala script team members, many of whom have long been involved with Sinhala Unicode standardization and second-level domain names.
We were hosted by Theekshana, a non-profit company associated to University of Colombo School of Computing (UCSC), with our meeting room overlooking the beautiful cricket grounds on UCSC campus.
The Sinhala (සිංහල) script used to be written on palm leaves and is drawn in curves because straight lines could tear the leaves. It is used to write Sanskrit and Pali texts. During the meetings, we explained the constraints imposed by the procedure to develop the RZ-LGR and discussed the implications on the Sinhala script. The panel was also briefed on label-level rules being developed for Devanagari script by the Neo-Brahmi Generation Panel (NBGP) to organize the Akshara constraints, as an example for organizing Sinhala characters. Finally, the panel reviewed other scripts and found possible cross-script variant character cases with Kannada, Malayalam, Myanmar and Telegu scripts, which it intends to investigate further. The meeting ended with a media briefing where the Sinhala GP chair, Sri Lanka's representative on the Governmental Advisory Committee (GAC), and ICANN org talked about the importance of IDNs and announced the formation of the Sinhala GP.
The Neo-Brahmi Generation Panel (NBGP) panel was also convening in Colombo for their face-to-face meeting. The Neo-Brahmi and Sinhala GP teams started with a joint session, with members from India, Nepal, and Sri Lanka (we hope Bangladesh will join soon), focused on matters of mutual interest. These included the consistent framework for label-level rules for South Asian scripts, cross-script variants between Sinhala script and the scripts covered by NBGP, and the proposal for the Tamil script being developed by the NBGP.
The NBGP had set an aggressive goal to hit at its meeting. After discussing Tamil, the next meeting featured a formidable line-up of experts to deliberate on Gujarati, Gurmukhi, Kannada, Malayalam, and Telugu scripts.
Work on these scripts has progressed significantly. The first complete drafts were reviewed by the panel, which identified challenges and possible ways forward. Experts working on the Kannada script identified characters which haven't been used in nearly 50 years and eventually excluded them from the RZ-LGR proposal.
The panel had a detailed discussion on cross-script variant characters to determine an agreeable mechanism to identify them – should variant characters be strictly homoglyphs, or should they also include other confusing characters? The panel converged on a solution to tag each pair of candidate cross-script variant characters with one of the three colors , based on independent feedback from members from both relevant scripts, for indistinguishable, similar and distinct characters. The GP identified multiple script pairs including Devanagari-Gurmukhi (देवनागरी - ਗੁਰਮੁਖੀ), Kannada-Telugu (ಅಕ್ಷರಮಾಲೆ - తెలుగు లిపి) and Tamil-Malayalam (தமிழ் - മലയാളലിപി) for the cross-script variant analysis.
We concluded our trip with a visit to the Cricket Club Café, a popular eatery in Colombo. We were surrounded by nostalgic cricket memorabilia including a bat with signatures of the Sri Lankan eleven who won the 1996 Cricket World Cup. Although we were as tired as we would be after playing a five-day cricket match, we were also enthused by the immense energy the community members injected in the invigorating discussions. We were driven by their passion to speak up for their languages and scripts. As we perused the menu, we were silently mindful of the line-up of the eleven scripts covered by the three panels we had been supporting in this trip. These scripts also exhibited a glorious variety in their style, simultaneously arraying a unifying South Asian Abugida writing culture at a deeper level.