This file contains Label Generation Rules (LGR) for the Devanagari script as would be appropriate for the Root zone. For more details on this proposal, see “Proposal for a Devanagari Script Root Zone Label Generation Rule-Set (LGR)” [Proposal]. The format of this file follows [RFC 7940].
The Root Zone LGR for the Devanagari script lists 83 unique code points in addition to 22 sequences, bringing the total repertoire entries to 105 . The two sequences U+0931 U+094D U+092F (ऱ्य) and U+0931 U+094D U+0939 (ऱ्ह) limit the character U+0931 (DEVANAGARI LETTER RRA) in its own specific context beyond which it does not stand by itself. Accordingly, while U+0931 (ऱ) is not listed by itself, it brings the total of distinct code points to 84.
A number of other sequences have been defined in connection with the definition of variants (see "Variants" below).
The repertoire includes code points used by languages written in Devanagari that fall within [EGIDS] scale 1 to 4. Boro, Braj, Dhundari, Mundari, Kharia have also been additionally covered. Though listed in EGIDS scale 4, Saraiki is not covered, because the Devanagari script is “no longer in use” by the Saraiki community. For more details, see Section 5 “Repertoire” in [Proposal]). A non-exhaustive list of languages using each code point can be found in the comments.
The repertoire is based on [MSR-4], which is a subset of Unicode 6.3 [Unicode 6.3].
According to Section 6 “Variants”, in [Proposal], this LGR defines variants which are “Confusing due to deviation from normally perceived character formations by the larger linguistic community” These cases are not of mere visual similarity as they involve some deviations from the widely accepted norms of Devanagari Akshar formations. These can cause confusion even to a careful observer and are hence proposed as variants. They fall into three broad categories:
Variant Disposition: All variants are of type “blocked”, making labels that differ only by these variants mutually exclusive: whichever label containing either of these variants is chosen earlier would be delegated, while any other equivalent label should be blocked.
In addition to these, cross-script variant analysis of Devanagari has been carried out by the NBGP. Possible cross-script variant cases were found with the Gurmukhi and Bengali script and have been mentioned in Appendix 1 of the [Proposal].
Context Rules for Variants: some of the variants defined in this LGR are "effective null variants", that is, both some code points in the source map to "nothing" in the target with all other code points unchanged. (Because mappings are symmetric, it does not matter whether it is the forward or reverse mapping that maps to "null"). Such variants require a context rule to keep the variant well behaved. Symmetry requires the same context rule for both forward and reverse mappings.
In other cases, the sequences or code points making up source and target are constrained by context rules on the code points. In such a case, any variants require context rules that match the intersection between the contexts for both source and target; otherwise a sequence might be considered valid in some variant label when it would not be valid in an equivalent context in an original label.
Devanagari is an alphasyllabary and the heart of the writing system is the akshar. It is this unit, which is instinctively recognized by users of the script. The writing system of Devanagari could be summed up as composed of Consonants, Halant, Vowels, Anusvara, Candrabindu, Nukta and Visarga.
Consonants: Devanagari consonants all contain an implicit schwa /ə/. To make a full syllable, consonants may be followed by certain code points from one or more of the other groups (see “WLE rules” below). See Section “3.3.1 The Consonants” of the [Proposal].
Halant: All consonants contain an implicit vowel (schwa). A special sign is needed to denote that this implicit vowel is stripped off. This is known as the Halant (U+094D). The Halant thus joins two consonants and creates conjuncts, which can be generally from 2 to 4 consonant combinations. In rare cases, it can join up to 5 consonants. However, this LGR will not enforce any length limit. See section 3.3.2 “The Implicit Vowel Killer: Halant” in [Proposal].
Vowels: There are separate code points for vowels that are pronounced independently at the beginning of a syllable or after a vowel sound. To indicate a Vowel sound following a consonant other than the implicit shwa sound, a vowel sign (Matra) is attached to the consonant. There is an equivalent Matra for each vowel excepting the U+0905. See Section “3.3.3 Vowels” of the [Proposal]
Anusvara : The Anusvara shows a nasal at the end of a syllable. See Section “3.3.4 The Anusvara” of the [Proposal].
Candrabindu : A Candrabindu denotes nasalization of the preceding vowel. Present-day Hindi users tend to replace the Candrabindu by the Anusvara. See Section “3.3.5 Nasalization: Candrabindu” of the [Proposal].
Nukta : The nukta sign is placed below a certain number of consonants to represent sounds found only in words borrowed from Perso-Arabic, English and other non-Aryan sources. It is also placed under U+0921 and U+0922 to indicate flapped sounds. Apart from this, Santali language uses Nukta adjoined to certain vowels and vowel signs. See Section “3.3.6 Nukta” of the [Proposal].
Visarga: The Visarga (U+0903), representing an aspiration at the end of a syllable, is frequently used in Sanskrit. See Section “3.3.7 Visarga and Avagraha” of the [Proposal].
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟.
These rules ensure that the Devanagari label conforms to akshar formation norms for the Devanagari script. These norms are exclusively presented as context rules.
The following symbols are used in the names and comments for WLE rules:
The rules are:
An additional rule is used only for variants where a Nukta maps to a "null":
See Section “7 Whole Label Evaluation Rules (WLE)” of the [Proposal].
Under the Neo-Brahmi Generation Panel, there are many different scripts belonging to separate Unicode blocks. Each of these scripts has been assigned a separate LGR; however, Neo-Brahmi GP ensured that the fundamental philosophy behind building those LGRs are all in sync with all other Brahmi derived scripts. This is the Devanagari LGR, which caters to multiple languages written using Devanagari belonging to EGIDS scale 1 to 4.
For additional details and contributors, see Sections 4 and 8 of the [Proposal].
References [0] to [11] refer to the Unicode Standard versions in which corresponding code points were initially encoded. Reference [100] and up correspond to sources given in [Proposal] for justifying the inclusion of for the corresponding code points. Single code point or ranges may have multiple source reference values.
In addition, the following references are cited in this document:
For more details for references [100] and up and [0] and up refer to the Table of References below.
]]>