This file contains Label Generation Rules (LGR) for the Bengali (Bangla) script as would be appropriate for the Root zone. For more details on this proposal see "Proposal for a Bengali Script Root Zone Label Generation Ruleset (LGR)" [Proposal]. The format of this file follows [RFC 7940].
This LGR covers Assamese, Bengali, Manipuri and a number of other languages written with the Bengali script.
According to Section 5, "Repertoire" in [Proposal], the Bengali LGR contains 61 unique code points, 9 code point sequences. Out of nine sequences: two sequences override WLE constraint, three sequences were defined to restrict U+09BC from appearing in any context other that these sequences, the other four sequences were defined for in-script variants. This brings the total number of elements in repertoire to 70.”
The repertoire is based on [MSR-4], which is a subset of Unicode 6.3 [Unicode 6.3].
Each code point is tagged with the script or scripts that the code point is used with, a category value, and one or more references documenting sufficient justification for inclusion in the repertoire, see "References" below. For code points that are part of the repertoire, comments identify the languages using the code point.
According to Section 6 "Variants", in "[Proposal]", this LGR defines in-script variants and cross-script variants which are "Confusing due to deviation from normally perceived character formations by the larger linguistic community". There are three in-script variants; two sequence sets and one set for variation of RA. See section 6.1. There are four cross-script variants; two sets with to Gurmukhi and the other two sets with Devanagari. See section 6.2.
Variant Disposition: The in-script variant pair U+09B0 and U+09F0 is of type “allocatable“. All other variants are of type “blocked”, making labels that differ only by these variants mutually exclusive: whichever label containing either of these variants is chosen earlier, the other one equivalent variant label should be blocked. There is no preference among these variants.
Consonants: All consonants contain an implicit vowel. More details in Section "3.3.1 The Consonants" of the [Proposal].
Hasanta: A special sign is needed whenever the implicit vowel in the preceding consonant is stripped off. This symbol is also known as the Halant or Virāma’. More details in Section "3.3.2 The Implicit Vowel Killer: Hasanta" of the [Proposal].
Vowels: Separate symbols exist for all ‘Swara’ or Vowels in Bengali, which are pronounced independently either at the beginning of the word or after another vowel or consonant sound. To indicate a Vowel sound other than the implicit one, a Vowel sign (Mātrā) is attached to the consonant. More details in Section "3.3.3 Vowels" of the [Proposal].
Anusvara: The Anusvara represents a homorganic nasal. It replaces a conjunct group of a Nasal Consonant+Halant+Consonant belonging to that particular barga or set. Before a non-barga consonant, the anusvara represents a nasal sound. More details in Section "3.3.4 The Anusvara" of the [Proposal].
Candrabindu: Candrabindu denotes nasalization of the preceding vowel as in চাঁদ /cãd/ ‘moon’ (U+099A U+09BE U+0981 U+09A6). This sign with a dot inside the half-moon mark is used as nasalization marker in many Indian scripts. More details in Section "3.3.5 Nasalization: Candrabindu" of the [Proposal].
Visarga and Avagraha: The Visarga U+0983 is frequently used in Bengali loanwords borrowed from Sanskrit and represents a sound very close to /h/. More details in Section "3.3.7 Visarga and Avagraha" of the [Proposal].
Ya-phala are two instances in Bangla where Hasanta is preceded by a full vowel (U+0985 অ - BENGALI LETTER A and U+098F এ - BENGALI LETTER E). More details in Section "3.3.9 Use of Ya-phala" of the [Proposal].
Ra-phala and Ref Sequences: RA+Hasanta (Repha or Ra-phala sequences). More details in Section "3.3.10 Ra-phala and Ref Sequences" of the [Proposal].
Nukta is listed by itself in the repertoire, it is only included in three sequences. More details in Section "3.3.6 Nukta" of the [Proposal].
Zero Width Non-joiner(ZWNJ),Zero Width Joiner (U+200D), are not included in the repertoire. Section "3.3.8 Zero Width Non-joiner (U+200C) and Zero Width Joiner (U+200D)" of the [Proposal].
The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟.
These rules have been formulated so that they can be adopted for LGR specification.
Following symbols are used in the WLE rules:
C → Consonant
M → Kar (Matra)
V → Vowel
B → Onushshar (Anusvara)
X → Biśarga (Visarga)
D → Candrabindu
H → Hasanta (Halant)
Z → KhandaTa
P → Ra-Hasanta
S → (a/e) Ya-phalā
The rules are:
More details in Section "7 Whole Label Evaluation Rules (WLE)" of the [Proposal]
Under the Neo-Brahmi Generation Panel, there are many different scripts belonging to separate Unicode blocks. Each of these scripts has been assigned a separate LGR; however Neo-Brahmi GP ensured that the fundamental philosophy behind building those LGRs are all in sync with all other Brahmi derived scripts. This is the Bengali LGR, which caters to Bengali language written using the Bengali script.
The following references are cited in this document:
References [101] through [128] listed below document the use of specific code points.
]]>