Overview

This file contains Label Generation Rules (LGR) for the Bengali (Bangla) script for the Root Zone. This LGR covers Assamese, Bengali, Manipuri and a number of other languages written with the Bengali script. For more details on this LGR and additional background on the script, see “Proposal for a Bengali Script Root Zone Label Generation Ruleset (LGR)” [Proposal-Bengali]. This file is one of a set of LGR files that together form an integrated LGR for the DNS Root Zone [RZ-LGR-6]. The format of this file follows [RFC 7940].

Repertoire

The repertoire contains 61 code points for letters, as well as 9 code point sequences, for a total of 70 repertoire elements. Out of the nine sequences: two sequences override a WLE constraint; four sequences were defined for in-script variants; and the other three sequences were defined to restrict U+09BC NUKTA from appearing in any context other than these sequences. Accordingly, while U+09BC is not listed by itself, it brings the total of distinct code points to 62. For more detail, see Section 5, “Repertoire” in [Proposal-Bengali].

Note that the code points U+09DC, 09DD and U+09DF are not in Normalization Form NFC and thus not PVALID under IDNA2008. Before performing lookup, any user agent accepting these code points will normalize them into the equivalent sequences with an explicit U+09BC Nukta code point. Accordingly, this LGR does not reference these code points, but instead includes only the sequences.

The repertoire is contained in [MSR-6], which is a subset of [Unicode 16.0.0].

As part of the Root Zone, this LGR includes neither decimal digits nor the HYPHEN-MINUS.

Repertoire Listing: Each code point or range is tagged with the script or scripts with which the code point is used and one or more other character categories. For each repertoire element, one or more references document sufficient justification for inclusion in the repertoire; see the “References” below. For code points that are part of the repertoire, comments identify the languages using the code point along with their [EGIDS] level.

Code points outside the Bengali script repertoire that are listed in this file are targets for out-of-repertoire variants and are identified by a reflexive (identity) variant of type “out-of-repertoire-var”. They do not form part of the repertoire.

Variants

This LGR defines in-script variants and cross-script variants as described in Section 6, “Variants”, in [Proposal-Bengali]. There are three in-script variants: two sequence sets and one set for variation of RA. See Section 6.1 of [Proposal-Bengali]. There are six cross-script variants: two sets with Gurmukhi and four sets with Devanagari. See also Section 6.2 of [Proposal-Bengali].

Variant Disposition: The in-script variant pair U+09B0 / U+09F0 is of type “allocatable”, thus allowing access to either user community. All other variants are of type “blocked”, making labels that differ only by these variants mutually exclusive: whichever label containing either of these variants is chosen earlier, the other one equivalent variant label should be blocked. There is no preference among these variants.

Context Rules for Variants: The Halant is only a variant at the end of a label, when it does not partake in forming a conjunct.

The specification of variants in the Root Zone LGR follows the guidelines in [RFC 8228].

Character Classes

Consonants: All consonants contain an implicit vowel. More details in Section 3.3.1, “The Consonants” of [Proposal-Bengali].

Hasanta: A special sign is needed whenever the implicit vowel in the preceding consonant is stripped off. This symbol is also known as the Halant or “Virama”. The nominal rendering of the Hasanta is visible only at the end of the label. More details in Section 3.3.2, “The Implicit Vowel Killer: Hasanta” of [Proposal-Bengali].

Vowels and Kar (Matra): Separate symbols exist for all “Swara” or vowels in Bengali, which are pronounced independently either at the beginning of the word or after another vowel or consonant sound. To indicate a vowel sound other than the implicit one, a vowel sign (Kar) is attached to the consonant, analogous to Matra in other Neo-Brahmi scripts. More details in Section 3.3.3, “ Vowels” of [Proposal-Bengali].

Anusvara: The Anusvara represents a homorganic nasal. It replaces a conjunct group of a Nasal Consonant+Halant+Consonant belonging to that particular barga or set. Before a non-barga consonant, the anusvara represents a nasal sound. More details in Section 3.3.4, “The Anusvara” of [Proposal-Bengali].

Candrabindu: Candrabindu denotes nasalization of the preceding vowel as in চাঁদ /cãd/ “moon” (U+099A U+09BE U+0981 U+09A6). This sign with a dot inside the half-moon mark is used as nasalization marker in many Indian scripts. More details in Section 3.3.5, “Nasalization: Candrabindu” of [Proposal-Bengali].

Visarga and Avagraha: The Visarga U+0983 is frequently used in Bengali loanwords borrowed from Sanskrit and represents a sound very close to /h/. More details in Section 3.3.7, “Visarga and Avagraha” of [Proposal-Bengali].

Ya-phala: There are two instances in Bangla where a Hasanta is preceded by a full vowel (U+0985 BENGALI LETTER A and U+098F BENGALI LETTER E). More details in Section 3.3.9, “Use of Ya-phala” of [Proposal-Bengali].

Ra-phala and Ref Sequences: RA+Hasanta (Repha or Ra-phala sequences). More details in Section 3.3.10, “Ra-phala and Ref Sequences” of [Proposal-Bengali].

Nukta: Nukta is not listed by itself in the repertoire; it is only included in three sequences. More details in Section 3.3.6, “Nukta” of [Proposal-Bengali].

Zero Width Non-joiner (ZWNJ) and Zero Width Joiner (ZWJ): These are not included in the repertoire. More details in Section 3.3.8, “Zero Width Non-joiner (U+200C) and Zero Width Joiner (U+200D)” of [Proposal-Bengali].

Whole Label Evaluation (WLE) and Context Rules

Default Whole Label Evaluation Rules and Actions

The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-6]. They are marked with ⍟. The default prohibition on leading combining marks is equivalent to ensuring that a label only starts with a consonant or vowel. The actions compute a label disposition based on WLE rules or variant mapping types.

Bengali-specific Rules

These rules have been formulated as context rules suitable for adoption into an LGR specification.

The following symbols are used in the WLE rules:

C → Consonant
M → Kar (Matra)
V → Vowel
B → Onushshar (Anusvara)
X → Bisarga (Visarga)
D → Candrabindu
H → Hasanta (Halant)
Z → KhandaTa
P → Ra-Hasanta
S → (a/e) Ya-phala

The rules are:

1. C: C is a set of C and CN where CN is the set of normalized forms of {ড়,ঢ়,য়}
2. H: must be preceded by C
3. M: must be preceded by C
4. D: must be preceded by any of V, C, M
5. X: must be preceded by any of V, C, M, D
6. B: must be preceded by any of V, C, M, D
7. Z: must be preceded by any of V, C, M, D, B, X, P
8. V: CANNOT be preceded by H
9. S: CANNOT be preceded by H
10. U+09B0 and U+09F0 CANNOT be mixed in the same label

More details in Section 7, “Whole Label Evaluation Rules (WLE)” of [Proposal-Bengali].

The following context rule is used for variants of Halant:

Variant is not defined unless it is followed by end of label

Methodology and Contributors

The Root Zone LGR for the Bengali (Bangla) Script was developed by the Neo-Brahmi Generation Panel (NBGP), the members of which have experience in linguistics and computational linguistics in a wide variety of languages written with Neo-Brahmi scripts. Under the Neo-Brahmi Generation Panel, there are nine scripts belonging to separate Unicode blocks. Each of these scripts has been assigned a separate LGR, with the Neo-Brahmi GP ensuring that the fundamental philosophy behind building each LGR is in sync with all other Brahmi-derived scripts. For further details on methodology and contributors, see Sections 4 and 8 in [Proposal-Bengali], as well as [RZ-LGR-6-Overview].

Changes from RZ LGR-5

In RZ LGR-6 the following changes were made:

A clerical error has been fixed: the character class for U+0994 Letter AU has been corrected to Vowel with a corresponding context rule. From RZ LGR-6, affected labels are now reported as valid in accordance with [Proposal-Bengali].
Two missing cross-script variants were added for Candrabindu and Hasanta and their Devanagari counterparts. These code points were mistakenly believed to not participate in the formation of possible variant labels.

For the prior version see [RZ-LGR-5-Beng].

References

The following general references are cited in this document:

[EGIDS]: Lewis and Simons, “EGIDS: Expanded Graded Intergenerational Disruption Scale,” documented in [SIL-Ethnologue] and summarized here: https://en.wikipedia.org/wiki/Expanded_Graded_Intergenerational_Disruption_Scale_(EGIDS)
[MSR-6]: Integration Panel, “Maximal Starting Repertoire — MSR-6 Overview and Rationale”, 23 September 2025, https://www.icann.org/en/system/files/files/msr-6-overview-23sep25-en.pdf
[Proposal-Bengali]: Neo-Brahmi Generation Panel, “Proposal for a Bangla (Bengali) Script Root Zone Label Generation Rule-Set (LGR)”, 20 May 2020 (PDF), https://www.icann.org/en/system/files/files/proposal-bangla-lgr-20may20-en.pdf
[RFC 7940]: Davies, K. and A. Freytag, “Representing Label Generation Rulesets Using XML”, RFC 7940, August 2016, https://www.rfc-editor.org/info/rfc7940
[RFC 8228]: A. Freytag, “Guidance on Designing Label Generation Rulesets (LGRs) Supporting Variant Labels”, RFC 8228, August 2017, https://www.rfc-editor.org/info/rfc8228
[RZ-LGR-5-Beng]: ICANN, Root Zone Label Generation Rules for the Bengali (Bangla) Script (und-Beng), Version 5, 26 May 2022 (XML) https://www.icann.org/sites/default/files/lgr/rz-lgr-5-bengali-script-26may22-en.xml
non-normative HTML presentation: https://www.icann.org/sites/default/files/lgr/rz-lgr-5-bengali-script-26may22-en.html
[RZ-LGR-6]: Integration Panel, “Root Zone Label Generation Rules (RZ-LGR-6)”, 23 September 2025 (XML), https://www.icann.org/sites/default/files/lgr/rz-lgr-6-common-23sep25-en.xml
non-normative HTML presentation: https://www.icann.org/sites/default/files/lgr/rz-lgr-6-common-23sep25-en.html
[RZ-LGR-6-Overview]: Integration Panel, “Root Zone Label Generation Rules (RZ LGR-6): Overview and Summary”, 23 September 2025, https://www.icann.org/sites/default/files/lgr/rz-lgr-6-overview-23sep25-en.pdf
[SIL-Ethnologue]: David M. Eberhard, Gary F. Simons & Charles D. Fennig (eds.). 2021. Ethnologue: Languages of the World, Twenty fourth edition. Dallas, Texas: SIL International. Online version available as https://www.ethnologue.com
[Unicode 16.0.0]: The Unicode Consortium. The Unicode Standard, Version 16.0.0, (South San Francisco: The Unicode Consortium, 2024. ISBN 978-1-936213-34-4) https://www.unicode.org/versions/Unicode16.0.0/

For references consulted, particularly in designing the repertoire for the Bengali script for the Root Zone, please see details in the Table of References below. References [0] and [7] refer to the Unicode Standard versions in which the corresponding code points were initially encoded. References [101] and above correspond to sources given in [Proposal-Bengali] justifying the inclusion of the corresponding code points. Entries in the table may have multiple source reference values.

]]> The Unicode Standard, Version 1.1 The Unicode Standard, Version 4.1 Wikipedia, Bengali alphabet, accessed on 2017-11-25 https://en.wikipedia.org/wiki/Bengali_alphabet Bengali alphabet for Manipuri, found in Omniglot, “Manipuri (Meeteilon/ Meithei)”, accessed on 20.10.2019 https://www.omniglot.com/writing/manipuri.htm Omniglot, “Assamese (অসমীয়া)”, accessed on 2020-04-28 https://www.omniglot.com/writing/assamese.htm