This file contains Label Generation Rules (LGR) for the Telugu script as would be appropriate for the Root zone. For more details on this proposal see "Proposal for a Telugu Script Root Zone Label Generation Ruleset (LGR)" [Proposal]. The format of this file follows [RFC 7940].
According to Section 5, "Repertoire" in [Proposal], the Telugu LGR contains 63 unique code points.
The repertoire is based on [MSR-4], which is a subset of Unicode 6.3 [Unicode 6.3].
Each code-point has associated Glyph, Character Name, Unicode General Category (gc), EGIDS status, Indic Syllabic Category and Reference.
According to Section 6 "Variants", in "[Proposal]", this LGR defines cross-script variants which are "Confusing due to deviation from normally perceived character formations by larger linguistic community". These cases are not of mere visual similarity. These can cause confusion even to a careful observer and hence being proposed as variants.
Variant Disposition: All variants are of type “blocked”, making labels that differ only by these variants mutually exclusive: whichever label containing either of these variants is chosen earlier, the other one equivalent variant label should be blocked. There is no preference among these variants.
The Telugu orthography superficially resembles a series of circles and semi-circles. Most consonants carry a tick mark called ‘talakattu’. The writing system is classified as abugida type that employs alpha-syllabaries. The alphabet consists of vowels, consonants and modifiers. Each of these vowels and consonants have one or more secondary allographs.
Vowels and vowel modifiers: There are fourteen vowel characters viz. అ, ఆ, ఇ, ఈ, ఉ, ఊ, ఋ, ఌ, ఎ, ఏ, ఐ, ఒ, ఓ, ఔ, in the common inventory and two (ౠ, ౡ) which are obsolete. Each member of the common inventory has one to many secondary variants depending on the size and shape of the consonant that functions as an anchor. More details in Section "3.5.1 The vowels and vowel modifiers" of the [Proposal].
Anusvara or sunna: The Anusvara or sunna represents a homorganic nasal before the corresponding consonant and as a substitute to transcribe word final /mu/. Essentially it substitutes a cluster of a Nasal Consonant+Halant before a consonant. More details in Section "3.5.2 The Anusvara or sunna" of the [Proposal].
Consonants: The Telugu consonants contain an implicit vowel /a/ . More details in Section "3.5.4 The Consonants" of the [Proposal].
Halant: A special sign is needed whenever the implicit vowel in the preceding consonant is stripped off. This symbol is known as the Halant. (Any vowel sign will also deduct the implicit vowel). More details in Section "3.5.1 The vowels and vowel modifiers" of the [Proposal].
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟.
These rules have been formulated so that they can be adopted for LGR specification.
Following symbols are used in the WLE rules:
C → Consonant
M → Matra
V → Vowel
B → Anusvara (Bindu)
X → Visarga
H → Halant / Virama
The rules are:
More details in Section "7 Whole Label Evaluation Rules (WLE)" of the [Proposal]
Under the Neo-Brahmi Generation Panel, there are many different scripts belonging to separate Unicode blocks. Each of these scripts will be assigned a separate LGR; however Neo-Brahmi GP will ensure that the fundamental philosophy behind building those LGRs are all in sync with all other Brahmi derived scripts. This is the Telugu LGR, which caters to Telugu language written using the Telugu script.
Following references are cited in this document: