This document specifies a reference set of Label Generation Rules for Bulgarian using a limited repertoire as appropriate for a second level domain.
All references converge on 30 Cyrillic code points (23 +9 as defined by RFC 5992 [130]). CLDR in its auxiliary list adds 7 code points of which 2 correspond to historic letters. These code points are U+0463 CYRILLIC SMALL LETTER YAT and U+046B CYRILLIC SMALL LETTER BIG YUS. They are also excluded from MSR-2 (see MSR-2 in the Table of References) as being obsolete. Of the other 5, 3 have no other sources and are not included. The last 2: U+0450 CYRILLIC SMALL LETTER IE WITH GRAVE and U+045D CYRILLIC SMALL LETTER I WITH GRAVE have additional sources and are part of the extended set.
Note: The article in [605] indicates that "In Bulgarian and Macedonian, the grave accent is sometimes, although not very often, used on the vowels а, о, у, е, и and ъ (ъ exists in Bulgarian only) to mark stress...In a few cases (mostly on the vowels е and и) the stress mark is orthographically required to distinguish words which are Homographs".
There is an IDN table published in the IANA Repository of IDN Practices for Bulgarian by .bg (Bulgaria cctld) in [700]
Letters documented in some references but not included:
U+044B CYRILLIC SMALL LETTER YERU
U+044D CYRILLIC SMALL LETTER E
U+0451 CYRILLIC SMALL LETTER IO
U+0463 CYRILLIC SMALL LETTER YAT
U+046B CYRILLIC SMALL LETTER BIG YUS
U+0463 CYRILLIC SMALL LETTER YAT
U+0430 U+0300 CYRILLIC SMALL LETTER A WITH GRAVE ACCENT
U+043E U+0300 CYRILLIC SMALL LETTER O WITH GRAVE ACCENT
U+0443 U+0300 CYRILLIC SMALL LETTER U WITH GRAVE ACCENT
U+044A U+0300 CYRILLIC SMALL LETTER HARD SIGN WITH GRAVE ACCENT
U+044E U+0300 CYRILLIC SMALL LETTER YU WITH GRAVE ACCENT
U+044F U+0300 CYRILLIC SMALL LETTER YA WITH GRAVE ACCENT
Two letters not considered essential to writing the core vocabulary of the language are nevertheless in common use. Where they have not been added to the core repertoire, they are flagged as "extended-cp" in the table of code points. A context rule is provided that by default will prohibit labels with extended code points. To support extended single code points or code point sequences, delete the context "extended-cp" from their repertoire definition.
None.
This LGR defines no named character classes.
Common rules only:
Hyphen Restrictions — restrictions on the allowable placement of hyphens (no leading/ending hyphen and no hyphen in positions 3 and 4). These restrictions are described in section 4.2.3.1 of RFC5891 [120]. They are implemented here as context rule on U+002D (-) HYPHEN-MINUS.
Leading Combining Marks — restrictions on the allowable placement of combining marks (no leading combining mark). This rule is described in section 4.2.3.2 of RFC5891 [120].
Actions included are the default actions for LGRs as well as those needed to invalidate labels with misplaced combining marks.
Variant-related actions included to facilitate integration as appropriate.
This reference LGR for Bulgarian for the 2nd Level has been developed by Michel Suignard and Asmus Freytag, verified in expert reviews by Michael Everson, Nicholas Ostler, and Wil Tan, and based on multiple open public consultations.
Language tag has been updated.
General reference for the language:
Scatton, Ernest B. 1993. "Bulgarian", in Bernard Comrie & Greville G. Corbett, eds. The Slavonic languages. London; New York: Routledge. ISBN 0-415-04755-2
In the listing of the repertoire by code point, references starting from [0] refer to the version of the Unicode Standard in which the corresponding code point was initially encoded. Other references, (starting from [100]) document usage of code points. For more details, see the Table of References below.
]]>