Skip to main content

Supporting Linguistic Diversity of Africa for the Internet’s Top-Level Domain Names

Thirty different languages – that was the result of a quick poll asking the twenty participants attending the IDN Workshop to list the languages they speak. ICANN organized the workshop at the Africa Internet Summit in Nairobi on 28 May 2017. This response exemplifies the enormous linguistic diversity in Africa, where the use of several languages – or multilingualism – is the norm. There are at least 2,144 languages spoken across the continent, with individual countries such as Nigeria having as many as 520 languages. By way of comparison, 287 languages are spoken in Europe.

Historically, Africa is among the places where written communication was established first, with the Egyptian hieroglyphs being among the oldest writing systems discovered. But the majority of the African languages used today are only spoken – without written form. Still, estimates show that more than 500 languages have a written form. Not surprisingly, the diversity of the writing systems created by Africans mirrors the diversity encountered with spoken languages: up to 29 scripts saw their creation in Africa – spanning nearly all known script types, including abjads, abugidas, alphabets, syllabaries, and logo-syllabaries. Of these scripts, 21 may still be in use and new scripts are being created continuously, with some defying current linguistic classifications, such as the colorful Oracle Rainbow Script created as recently as 1999. The more widely used scripts include Tifinagh, for example, an ancient script used since the 3rd century Before Common Era (BCE), which was revitalized in the 20th century and is now used in a standardized form to teach Berber languages such as Amazigh to pupils in primary schools of Morocco. For an example, see the primer in Amazigh developed by the Institut Royal de la Culture Amazighe.

Further examples include the Ethiopic script used for many languages in Ethiopia and Eritrea, the Vai syllabary used for Vai language of Liberia, or N'ko, an alphabet used for a family of languages called Manding in West Africa. Several scripts are now historic and have fallen out of use, while others such as N’ko have viable user communities and can be represented digitally today. However, many scripts lack resources such as fonts or input methods, nor are they officially supported or recognized.

The most widely used scripts of Africa are foreign scripts introduced historically, namely the Arabic script (referred to as Ajami in some language communities) and the Latin script. These scripts have been extended to represent the additional sounds in local languages of Africa. Examples include click sounds used by languages of Southern and Eastern Africa such as lateral clicks (listen to a pronunciation), written with symbols not considered letters in other languages (such as the double pipe ǁ), or by very complex sequences of letters (such as gǁx’ ([ᶢǁʢ] in the International Phonetic Alphabet) in Juǀʼhoansi, a language of Namibia and Botswana. The same has also been done for the Arabic script, with new letters created to represent local sounds such as the prenasalized stop /mb/ or /ᵐbʷ/ (listen to a pronunciation) in Chimiini, a language of Somalia (as there is limited font support for this letter, see U+08B6 encoded by the Unicode standard to view its orthography). 

Furthermore, the use of multiple scripts by the same language community – called multiscripturalism – is very common in Africa. For example, two versions of Alphabet National du Tchad (ANT) have been created, one based on Latin script and the other based on Arabic script. Communities using Sar language may write it in either script, for example, the word for lion is written as “ɓəl” in ANT Latin and ٻّلْ in ANT Arabic as shown here.

Extract from Alphabet National du Tchad. The green column gives French translations of words from the Chadian language listed in the blue column, both in Latin-script based forms (red column), as well as in Arabic-script based forms (yellow column). (Table presented in a proposal by Priest and Hosken from Décret fixant l’Alphabet National du Tchad, 2010.)

 

ICANN is currently undertaking a program to support Internationalized Domain Names (IDNs) as top-level domains (TLDs). It is developing Label Generation Rules for the Root Zone (RZ-LGR) to support the different scripts. This work is led by community-based panels (called Generation Panels, GPs) which document the use of the script based on the procedure finalized by the community. The Arabic script GP has already finalized its work and supports the major African languages that are written in the Arabic script. More recently, the Ethiopic script GP has also finalized its proposal for integration into the RZ-LGR.

Latin script GP has also started its work and is investigating the use of the script in Africa, in addition to other continents. It is challenging to determine how the Latin script has been extended to cater to the African languages as there is limited documentation. Therefore, ICANN has been reaching out to the communities in Africa to get them involved in this effort. ICANN has been holding annual IDN workshops in Africa for this purpose – Congo in 2015, Addis Ababa in 2016, and Nairobi in 2017.

While ICANN has received some expressions of interest, more volunteers are needed from Africa for the Latin GP to advance this important work. Please email IDNProgram@icann.org if you are interested in participating or have any queries.

The RZ-LGR project currently includes Arabic, Ethiopic, and Latin scripts in the context of Africa. ICANN will  support other scripts in Africa for IDN TLDs, if they are actively being used by the relevant communities, and if the communities can gather sufficient interest to form GPs and develop proposals for the RZ-LGR.

Please visit www.icann.org/idn for more details about the IDN Program at ICANN.

Comments

    Domain Name System
    Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."