Root Zone LGR for script und-Arab (Arabic) lgr-3-Arabic-Script-25apr19-en

This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.

Date 2019-04-25
LGR Version 3
Language und-Arab
Scope domain: "." (Root)
Unicode Version 6.3.0

Table of Contents

1 Description

Root Zone Label Generation Rules for the Arabic Script

Overview

This file contains a set of Label Generation Rules (LGR) for Arabic for the Root Zone. For more details on this LGR and its development, see TF-AIDN, "Proposal for Arabic Script Root Zone LGR", Version 3.4, 18 November 2015 [Proposal]. The format of this file follows [RFC 7940].

Repertoire

The repertoire is described in Section 3.2 of [Proposal] and only includes the 128 code points used by languages that are actively written in the Arabic script. It excludes code points for which TF-AIDN was unable to find sufficient evidence of use (see Appendix F in [Proposal]). The repertoire is based on [MSR-4], which is a subset of [Unicode 6.3].

This LGR does not include combining marks or code point sequences. All combining marks have been excluded for these reasons:

As part of the Root Zone, this LGR includes neither digits nor the HYPHEN-MINUS.

For further details, see Section 3.2 "Code point repertoire included", in [Proposal].

Each code point or range is tagged with the script or scripts that the code point is used with, and one or more references documenting sufficient justification for inclusion in the repertoire, see "References" below. Comments identify the languages using the code point.

Variants

This LGR includes "blocked" and "allocatable" variants, assigned according to Section 4 "Final recommendation of variants for Top Level Domains (TLDs)" in [Proposal]. These recommendations balance the desire to minimize the number of possible allocatable variants with the need to keep the definition of variants simple. See also the comments given in the listing.

The specification of variants in the Root Zone LGR follows the guidelines in [RFC 8228].

Character Classes

This proposal does not define named character classes.

Whole Label Evaluation (WLE) and Context Rules

Default Whole Label Evaluation Rules and Actions

The LGR includes the set of required default WLE rules and actions applicable to the Root Zone and defined in [MSR-4]. They are marked with ⍟.

Arabic-specific Rules and Actions

This LGR includes WLErules and actions specific to the Arabic script. See Section 5 "Whole Label Evaluation (WLE) rules", in [Proposal]. As specified, the rules and actions serve to prevent the mixing of two variants of the same code point within the same label. This reduces overproduction of variant labels. The rules are listed here with the numbers given in Table 17 in [Proposal]. See also the comments given for each rule or action.

Methodology and Contributors

The proposal for an Arabic Script Root Zone LGR [Proposal] that this LGR is based on, was developed by the Task Force for Arabic Script IDNs [TF-AIDN], based on multiple open public consultations.

For more information and for methodology and contributors, see [Proposal].

References

The following general references are cited in this document:

[IAB]
Internet Architecture Board (IAB), "IAB Statement on Identifiers and Unicode 7.0.0"
https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/
[MSR-4]
Integration Panel, "Maximal Starting Repertoire — MSR-4 Overview and Rationale", 7 February 2019,
https://www.icann.org/en/system/files/files/msr-4-overview-25jan19-en.pdf
[Proposal]
TF-AIDN, "Proposal for Arabic Script Root Zone LGR", Version 3.4, 18 November 2015
https://www.icann.org/en/system/files/files/arabic-lgr-proposal-18nov15-en.pdf
[RFC 6365]
Hoffman, P. and J. Klensin, "Terminology Used in Internationalization in the IETF", BCP 166, RFC 6365, DOI 10.17487/RFC6365, September 2011,
http://www.rfc-editor.org/info/rfc6365
[RFC 7940]
Davies, K. and A. Freytag, "Representing Label Generation Rulesets Using XML", RFC 7940, August 2016,
http://www.rfc-editor.org/info/rfc7940
[RFC 8228]
A. Freytag, "Guidance on Designing Label Generation Rulesets (LGRs) Supporting Variant Labels", RFC 8228, August 2017,
https://www.rfc-editor.org/info/rfc8228
[Unicode 6.3]
The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)
http://www.unicode.org/versions/Unicode6.3.0/
[TF-AIDN]
Blog, "Task Force for Arabic Script IDNs"
https://www.icann.org/news/blog/what-is-the-task-force-on-arabic-script-idns-tf-aidn-up-to

For references consulted particularly in designing the repertoire for the Arable script for the Root Zone please see details in the Table of References below. References [0] to [12] refer to Unicode Standard versions in which the corresponding code points were initially encoded. References [100] and above correspond to sources justifying the inclusion of the corresponding code points. Single code point or ranges may have multiple source reference values.

2 Repertoire

Summary

Number of elements in Repertoire 128
Longest code point sequence 1

Repertoire by Code Point

The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where a comment in the original LGR is equal to the character name, it has been suppressed.

For any code point or sequence for which a variant is defined, additional information is provided in the Variants column. See also the legend provided below the table.

Code
Point
Glyph Script Name Ref Variants Comment
U+0620 ؠ Arabic ARABIC LETTER KASHMIRI YEH [11], [115]   Kashmiri
U+0621 ء Arabic ARABIC LETTER HAMZA [0], [100]   Arabic
U+0622 آ Arabic ARABIC LETTER ALEF WITH MADDA ABOVE [0], [100] set 1 Arabic
U+0623 أ Arabic ARABIC LETTER ALEF WITH HAMZA ABOVE [0], [100] set 1 Arabic
U+0624 ؤ Arabic ARABIC LETTER WAW WITH HAMZA ABOVE [0], [100] set 2 Arabic
U+0625 إ Arabic ARABIC LETTER ALEF WITH HAMZA BELOW [0], [100] set 1 Arabic
U+0626 ئ Arabic ARABIC LETTER YEH WITH HAMZA ABOVE [0], [100] set 3 Arabic
U+0627 ا Arabic ARABIC LETTER ALEF [0], [100] set 1 Arabic
U+0628 ب Arabic ARABIC LETTER BEH [0], [100]   Arabic
U+0629 ة Arabic ARABIC LETTER TEH MARBUTA [0], [100] set 4 Arabic
U+062A ت Arabic ARABIC LETTER TEH [0], [100] set 5 Arabic
U+062B ث Arabic ARABIC LETTER THEH [0], [100] set 6 Arabic
U+062C ج Arabic ARABIC LETTER JEEM [0], [100]   Arabic
U+062D ح Arabic ARABIC LETTER HAH [0], [100]   Arabic
U+062E خ Arabic ARABIC LETTER KHAH [0], [100]   Arabic
U+062F د Arabic ARABIC LETTER DAL [0], [100]   Arabic
U+0630 ذ Arabic ARABIC LETTER THAL [0], [100]   Arabic
U+0631 ر Arabic ARABIC LETTER REH [0], [100]   Arabic
U+0632 ز Arabic ARABIC LETTER ZAIN [0], [100]   Arabic
U+0633 س Arabic ARABIC LETTER SEEN [0], [100]   Arabic
U+0634 ش Arabic ARABIC LETTER SHEEN [0], [100]   Arabic
U+0635 ص Arabic ARABIC LETTER SAD [0], [100]   Arabic
U+0636 ض Arabic ARABIC LETTER DAD [0], [100]   Arabic
U+0637 ط Arabic ARABIC LETTER TAH [0], [100]   Arabic
U+0638 ظ Arabic ARABIC LETTER ZAH [0], [100]   Arabic
U+0639 ع Arabic ARABIC LETTER AIN [0], [100]   Arabic
U+063A غ Arabic ARABIC LETTER GHAIN [0], [100]   Arabic
U+0641 ف Arabic ARABIC LETTER FEH [0], [100] set 7 Arabic
U+0642 ق Arabic ARABIC LETTER QAF [0], [100] set 7 Arabic
U+0643 ك Arabic ARABIC LETTER KAF [0], [100] set 8 Arabic
U+0644 ل Arabic ARABIC LETTER LAM [0], [100]   Arabic
U+0645 م Arabic ARABIC LETTER MEEM [0], [100]   Arabic
U+0646 ن Arabic ARABIC LETTER NOON [0], [100] set 9 Arabic
U+0647 ه Arabic ARABIC LETTER HEH [0], [100] set 4 Arabic
U+0648 و Arabic ARABIC LETTER WAW [0], [100] set 2 Arabic
U+0649 ى Arabic ARABIC LETTER ALEF MAKSURA [0], [100] set 3 Arabic
U+064A ي Arabic ARABIC LETTER YEH [0], [100] set 3 Arabic
U+0672 ٲ Arabic ARABIC LETTER ALEF WITH WAVY HAMZA ABOVE [0], [102] set 1 Kashmiri
U+0679 ٹ Arabic ARABIC LETTER TTEH [0], [112] set 10 Urdu
U+067A ٺ Arabic ARABIC LETTER TTEHEH [0], [111] set 5 Sindhi
U+067B ٻ Arabic ARABIC LETTER BEEH [0], [111] set 3 Sindhi
U+067C ټ Arabic ARABIC LETTER TEH WITH RING [0], [108]   Pashto
U+067D ٽ Arabic ARABIC LETTER TEH WITH THREE DOTS ABOVE DOWNWARDS [0], [111] set 6 Sindhi
U+067E پ Arabic ARABIC LETTER PEH [0], [109] set 11 Persian
U+067F ٿ Arabic ARABIC LETTER TEHEH [0], [111]   Sindhi
U+0680 ڀ Arabic ARABIC LETTER BEHEH [0], [111]   Sindhi
U+0681 ځ Arabic ARABIC LETTER HAH WITH HAMZA ABOVE [0], [108], [138]   Pashto
U+0683 ڃ Arabic ARABIC LETTER NYEH [0], [111] set 12 Sindhi
U+0684 ڄ Arabic ARABIC LETTER DYEH [0], [111] set 12 Sindhi
U+0685 څ Arabic ARABIC LETTER HAH WITH THREE DOTS ABOVE [0], [108], [138]   Pashto
U+0686 چ Arabic ARABIC LETTER TCHEH [0], [109]   Persian
U+0687 ڇ Arabic ARABIC LETTER TCHEHEH [0], [111]   Sindhi
U+0688 ڈ Arabic ARABIC LETTER DDAL [0], [112]   Urdu
U+0689 ډ Arabic ARABIC LETTER DAL WITH RING [0], [108], [138]   Pashto
U+068A ڊ Arabic ARABIC LETTER DAL WITH DOT BELOW [0], [111]   Sindhi
U+068B ڋ Arabic ARABIC LETTER DAL WITH DOT BELOW AND SMALL TAH [0], [110], [139]   Saraiki
U+068C ڌ Arabic ARABIC LETTER DAHAL [0], [111]   Sindhi
U+068D ڍ Arabic ARABIC LETTER DDAHAL [0], [111]   Sindhi
U+068E ڎ Arabic ARABIC LETTER DUL [0], [137] set 13 Malay
U+068F ڏ Arabic ARABIC LETTER DAL WITH THREE DOTS ABOVE DOWNWARDS [0], [111] set 13 Sindhi
U+0691 ڑ Arabic ARABIC LETTER RREH [0], [112]   Urdu
U+0693 ړ Arabic ARABIC LETTER REH WITH RING [0], [108], [138]   Pashto
U+0695 ڕ Arabic ARABIC LETTER REH WITH SMALL V BELOW [0], [106], [140]   Kurdish
U+0696 ږ Arabic ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE [0], [108], [138]   Pashto
U+0697 ڗ Arabic ARABIC LETTER REH WITH TWO DOTS ABOVE [0], [119], [146]   ANT
U+0698 ژ Arabic ARABIC LETTER JEH [0], [112]   Urdu
U+0699 ڙ Arabic ARABIC LETTER REH WITH FOUR DOTS ABOVE [0], [111], [143]   Sindhi, Torwali
U+069A ښ Arabic ARABIC LETTER SEEN WITH DOT BELOW AND DOT ABOVE [0], [108], [138]   Pashto
U+069F ڟ Arabic ARABIC LETTER TAH WITH THREE DOTS ABOVE [0], [121], [123], [130]   Hausa, Ajami
U+06A0 ڠ Arabic ARABIC LETTER AIN WITH THREE DOTS ABOVE [0], [107], [129], [144]   Malay
U+06A2 ڢ Arabic ARABIC LETTER FEH WITH DOT MOVED BELOW [0], [101], [130], [131], [132] set 7 Ajami, Fulfulde, Hausa
U+06A4 ڤ Arabic ARABIC LETTER VEH [0], [106], [107], [127], [140] set 14 Malay, Kurdish
U+06A6 ڦ Arabic ARABIC LETTER PEHEH [0], [111]   Sindhi
U+06A7 ڧ Arabic ARABIC LETTER QAF WITH DOT ABOVE [0], [101], [130], [131], [132] set 7 Ajami, Fulfulde, Hausa
U+06A8 ڨ Arabic ARABIC LETTER QAF WITH THREE DOTS ABOVE [0], [124] set 14 Western Arabic
U+06A9 ک Arabic ARABIC LETTER KEHEH [0], [112] set 8 Urdu
U+06AA ڪ Arabic ARABIC LETTER SWASH KAF [0], [111] set 8 Sindhi
U+06AB ګ Arabic ARABIC LETTER KAF WITH RING [0], [108], [138] set 15 Pashto
U+06AD ڭ Arabic ARABIC LETTER NG [0], [105], [114], [133], [134] set 15 Kirghiz, Uyghur
U+06AE ڮ Arabic ARABIC LETTER KAF WITH THREE DOTS BELOW [0], [116]   ANT
U+06AF گ Arabic ARABIC LETTER GAF [0], [109] set 15 Persian
U+06B0 ڰ Arabic ARABIC LETTER GAF WITH RING [0], [110]   Saraiki
U+06B1 ڱ Arabic ARABIC LETTER NGOEH [0], [111]   Sindhi
U+06B3 ڳ Arabic ARABIC LETTER GUEH [0], [111]   Sindhi
U+06B5 ڵ Arabic ARABIC LETTER LAM WITH SMALL V [0], [106], [140]   Kurdish
U+06BA ں Arabic ARABIC LETTER NOON GHUNNA [3], [112] set 9 Urdu
U+06BB ڻ Arabic ARABIC LETTER RNOON [3], [111] set 10 Sindhi
U+06BC ڼ Arabic ARABIC LETTER NOON WITH RING [3], [108], [138]   Pashto
U+06BD ڽ Arabic ARABIC LETTER NOON WITH THREE DOTS ABOVE [0], [107] set 11 Malay
U+06BE ھ Arabic ARABIC LETTER HEH DOACHASHMEE [0], [112] set 4 Urdu
U+06C0 ۀ Arabic ARABIC LETTER HEH WITH YEH ABOVE [0], [116], [140] set 4 ANT, Kurdish
U+06C1 ہ Arabic ARABIC LETTER HEH GOAL [0], [112] set 4 Urdu
U+06C2 ۂ Arabic ARABIC LETTER HEH GOAL WITH HAMZA ABOVE [0], [125], [135], [141] set 4 Urdu
U+06C3 ۃ Arabic ARABIC LETTER TEH MARBUTA GOAL [0], [126] set 4 Urdu
U+06C4 ۄ Arabic ARABIC LETTER WAW WITH RING [0], [102]   Kashmiri
U+06C6 ۆ Arabic ARABIC LETTER OE [0], [102], [140], [142]   Kashmiri, Kurdish, Uyghur
U+06CB ۋ Arabic ARABIC LETTER VE [0], [103], [114], [136]   Kazakh, Kirghiz, Uyghur
U+06CC ی Arabic ARABIC LETTER FARSI YEH [0], [112] set 3 Urdu
U+06CD ۍ Arabic ARABIC LETTER YEH WITH TAIL [0], [108], [138] set 3 Pashto
U+06CE ێ Arabic ARABIC LETTER YEH WITH SMALL V [0], [127], [140]   Kurdish
U+06CF ۏ Arabic ARABIC LETTER WAW WITH DOT ABOVE [3], [107]   Malay
U+06D0 ې Arabic ARABIC LETTER E [0], [108], [138] set 3 Pashto
U+06D1 ۑ Arabic ARABIC LETTER YEH WITH THREE DOTS BELOW [0], [122] set 11 Bamana, Mandika
U+06D2 ے Arabic ARABIC LETTER YEH BARREE [0], [112] set 3 Urdu
U+06D5 ە Arabic ARABIC LETTER AE [0], [106], [114], [140] set 4 Kurdish, Uyghur
U+0751 ݑ Arabic ARABIC LETTER BEH WITH DOT BELOW AND THREE DOTS ABOVE [7], [121], [128], [130], [147]   Hausa, Wolof
U+0752 ݒ Arabic ARABIC LETTER BEH WITH THREE DOTS POINTING UPWARDS BELOW [7], [113], [130] set 11 Ajami, Wolof
U+0756 ݖ Arabic ARABIC LETTER BEH WITH SMALL V [7], [113], [120], [130]   Ajami, Wolof
U+0760 ݠ Arabic ARABIC LETTER FEH WITH TWO DOTS BELOW [7], [121], [130]   Ajami, Hausa
U+0762 ݢ Arabic ARABIC LETTER KEHEH WITH DOT ABOVE [7], [118], [129]   Malay
U+0763 ݣ Arabic ARABIC LETTER KEHEH WITH THREE DOTS ABOVE [7], [118] set 15 Moroccan
U+0766 ݦ Arabic ARABIC LETTER MEEM WITH DOT BELOW [7], [116], [121]   ANT
U+0767 ݧ Arabic ARABIC LETTER NOON WITH TWO DOTS BELOW [7], [113], [120] set 16 Wolof
U+0768 ݨ Arabic ARABIC LETTER NOON WITH SMALL TAH [7], [145]   Saraiki
U+076A ݪ Arabic ARABIC LETTER LAM WITH BAR [7], [113], [116], [120]   ANT, Wolof
U+076E ݮ Arabic ARABIC LETTER HAH WITH SMALL ARABIC LETTER TAH BELOW [9], [104]   Khowar
U+076F ݯ Arabic ARABIC LETTER HAH WITH SMALL ARABIC LETTER TAH AND TWO DOTS [9], [104]   Khowar
U+0770 ݰ Arabic ARABIC LETTER SEEN WITH SMALL ARABIC LETTER TAH AND TWO DOTS [9], [104]   Khowar
U+0771 ݱ Arabic ARABIC LETTER REH WITH SMALL ARABIC LETTER TAH AND TWO DOTS [9], [104]   Khowar
U+08A0 Arabic ARABIC LETTER BEH WITH SMALL V BELOW [12], [117]   DPLN
U+08A2 Arabic ARABIC LETTER JEEM WITH TWO DOTS ABOVE [12], [117]   DPLN
U+08A3 Arabic ARABIC LETTER TAH WITH TWO DOTS ABOVE [12], [113], [117]   DPLN, Wolof
U+08A4 Arabic ARABIC LETTER FEH WITH DOT BELOW AND THREE DOTS ABOVE [12], [116]   ANT
U+08A5 Arabic ARABIC LETTER QAF WITH DOT BELOW [12], [116]   ANT
U+08A6 Arabic ARABIC LETTER LAM WITH DOUBLE BAR [12], [116]   ANT
U+08A7 Arabic ARABIC LETTER MEEM WITH THREE DOTS ABOVE [12], [116]   ANT
U+08A8 Arabic ARABIC LETTER YEH WITH TWO DOTS BELOW AND HAMZA ABOVE [12], [121]   Hausa
U+08A9 Arabic ARABIC LETTER YEH WITH TWO DOTS BELOW AND DOT ABOVE [12], [121] set 16 Hausa

Legend

Code Point
A code point or code point sequence.
Name
Shows the character or sequence name from the Unicode Character Database. Named sequences are listed with their normative names, for ad-hoc sequences the individual names are shown separated by "+".
Glyph
The shape displayed depends on the fonts available to your browser.
Script
Shows the script property value from the Unicode Character Database. Combining marks may have the value Inherited and code points used with more than one script may have the value Common.
Ref
Links to the references associated with the code point or sequence, if any.
Variants
Link to the variant set the code point or sequence is a member of, except where a coded point or sequence maps only to itself, in which case the type of that mapping is listed.
Comment
The comment as given in the XML file. However, if the comment for this row consists only of the code point or sequence name, it is suppressed in this view. By convention, comments starting with "=" denote an alias.

3 Variant Sets

Summary

Number of variant sets 16
Largest variant set 8
Variants by Type
allocatable 26
blocked 166

The following tables list all variant sets defined in this LGR, except for singleton sets. Each table lists all variant mapping pairs of the set; one per row. Mappings are assumed to be symmetric: each row documents both forward (→) and reverse (←) mapping directions. In each table, the mappings are sorted by Source value in ascending code point order; shading is used to group mappings from the same source code point or sequence.

Where the type of both forward and reverse mappings are the same, a single value is given in the Type column, otherwise the types for forward and reverse mappings, as well as comments and references are listed above one another. For summary counts, both forward and reverse mappings are always counted separately.

In any LGR with variant specifications that are well behaved, all members within each variant set are defined as variants of each other; the mappings in each set are symmetric and transitive; and all variant sets are disjoint.

Common Legend

Source
By convention, the smaller of the two code points in a variant mapping pair.
Target
By convention, the larger of the two code points in a variant mapping pair.
Glyph
The shape displayed for source or target depends on the fonts available to your browser.
- forward
Indicates that variant Type, Ref and Comment apply to the mapping from source to target.
- reverse
Indicates that variant Type, Ref and Comment apply to the reverse mapping from target to source.
- both
Indicates that variant Type, Ref and Comment apply to both forward and reverse mapping.
Type
The type of the variant mapping. There are some predefined variant types such as “allocatable” and “blocked”, while others are defined specifically for each LGR.
Ref
One or more reference IDs (optional). A "/" separates references for reverse / forward mappings, if different.
Comment
A descriptive comment (optional). A "/" separates comments for reverse / forward mappings, if different.

Variant Set 1 — 5 Members

Source Glyph Target Glyph   Type Ref Comment
0622 آ 0623 أ blocked    
0622 آ 0625 إ blocked    
0622 آ 0627 ا allocatable   U+0622 (آ) ALEF WITH MADDA ABOVE is simplified to U+0627 (ا) ALEF in the Arabic language
blocked    
0622 آ 0672 ٲ blocked    
0623 أ 0625 إ blocked    
0623 أ 0627 ا allocatable   U+0623 (أ) ALEF WITH HAMZA ABOVE is simplified to U+0627 (ا) ALEF in the Arabic language
blocked    
0623 أ 0672 ٲ blocked    
0625 إ 0627 ا allocatable   U+0625 (إ) ALEF WITH HAMZA BELOW is simplified to U+0627 (ا) ALEF in the Arabic language
blocked    
0625 إ 0672 ٲ blocked    
0627 ا 0672 ٲ blocked    
allocatable   U+0672 (ٲ) ALEF WITH WAVY HAMZA ABOVE is simplified to U+0627 (ا) ALEF in the Kashmiri language

Variant Set 2 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
0624 ؤ 0648 و allocatable   U+0624 (ؤ) WAV WITH HAMZA ABOVE is simplified to U+0648 (و) WAV in the Arabic language
blocked    

Variant Set 3 — 8 Members

Source Glyph Target Glyph   Type Ref Comment
0626 ئ 0649 ى blocked    
0626 ئ 064A ي blocked    
0626 ئ 067B ٻ blocked    
0626 ئ 06CC ی blocked    
0626 ئ 06CD ۍ blocked    
0626 ئ 06D0 ې blocked    
0626 ئ 06D2 ے blocked    
0649 ى 064A ي blocked    
0649 ى 067B ٻ blocked    
0649 ى 06CC ی blocked    
0649 ى 06CD ۍ blocked    
0649 ى 06D0 ې blocked    
0649 ى 06D2 ے blocked    
064A ي 067B ٻ blocked    
064A ي 06CC ی allocatable   The two are visually identical and the same label could be typed using one or the other based on the set language settings and keyboard layout of a user
064A ي 06CD ۍ blocked    
064A ي 06D0 ې blocked    
064A ي 06D2 ے blocked    
067B ٻ 06CC ی blocked    
067B ٻ 06CD ۍ blocked    
067B ٻ 06D0 ې blocked    
067B ٻ 06D2 ے blocked    
06CC ی 06CD ۍ blocked    
06CC ی 06D0 ې blocked    
06CC ی 06D2 ے blocked    
06CD ۍ 06D0 ې blocked    
06CD ۍ 06D2 ے blocked    
06D0 ې 06D2 ے blocked    

Variant Set 4 — 8 Members

Source Glyph Target Glyph   Type Ref Comment
0629 ة 0647 ه allocatable   In the Arabic language, U+0647 (ه) HEH may be substituted for U+0629 (ة) TEH MARBUTA. [RFC 6365]
blocked    
0629 ة 06BE ھ blocked    
0629 ة 06C0 ۀ blocked    
0629 ة 06C1 ہ blocked    
0629 ة 06C2 ۂ blocked    
0629 ة 06C3 ۃ allocatable   The two are visually identical and the same label could be typed using one or the other based on the set language settings and keyboard layout of a user. Labels in the Arabic language using U+0629 (ة) TEH MARBUTA in the final and isolated positions will be typed in other languages using U+06C3 (ۃ) TEH MARBUTA GOAL (Urdu, etc.) which is identical in isolated and has a variant glyph or identical glyph form in final position
0629 ة 06D5 ە blocked    
0647 ه 06BE ھ blocked    
0647 ه 06C0 ۀ blocked    
0647 ه 06C1 ہ allocatable   Labels in the Arabic language using U+0647 (ه) HEH in the final and isolated positions will be typed in other languages using U+06C1 (ہ) HEH GOAL (Urdu, Pashto, Saraiki, etc.) which is identical in isolated and has a variant glyph or identical glyph form in final position
0647 ه 06C2 ۂ blocked    
0647 ه 06C3 ۃ blocked    
0647 ه 06D5 ە blocked    
06BE ھ 06C0 ۀ blocked    
06BE ھ 06C1 ہ blocked    
06BE ھ 06C2 ۂ blocked    
06BE ھ 06C3 ۃ blocked    
06BE ھ 06D5 ە blocked    
06C0 ۀ 06C1 ہ blocked    
06C0 ۀ 06C2 ۂ blocked    
06C0 ۀ 06C3 ۃ blocked    
06C0 ۀ 06D5 ە allocatable   U+06C0 (ۀ) HEH WITH YEH ABOVE is simplified to U+0647 (ه) HEH in some languages (Kurdish)
blocked    
06C1 ہ 06C2 ۂ blocked    
allocatable   U+06C2 (ۂ) HEH GOAL WITH HAMZA ABOVE is simplified to U+06C1 (ہ) HEH GOAL in Urdu
06C1 ہ 06C3 ۃ blocked   This is not allocatable in either direction because, unlike the allocatable relationship between U+0647 (ه) HEH and U+0629 (ة) TEH MARBUTA due to variation in the Arabic language, Urdu and other languages using U+06C1 (ہ) HEH GOAL do not exhibit such variation with U+06C3 (ۃ) TEH MARBUTA GOAL
06C1 ہ 06D5 ە blocked    
06C2 ۂ 06C3 ۃ blocked    
06C2 ۂ 06D5 ە blocked    
06C3 ۃ 06D5 ە blocked    

Variant Set 5 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
062A ت 067A ٺ blocked    

Variant Set 6 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
062B ث 067D ٽ blocked    

Variant Set 7 — 4 Members

Source Glyph Target Glyph   Type Ref Comment
0641 ف 0642 ق blocked    
0641 ف 06A2 ڢ allocatable   Used interchangeably in Africa for languages using Western (African) orthography
0641 ف 06A7 ڧ blocked    
0642 ق 06A2 ڢ blocked    
0642 ق 06A7 ڧ allocatable   Used interchangeably in Africa for languages using Western (African) orthography
06A2 ڢ 06A7 ڧ blocked    

Variant Set 8 — 3 Members

Source Glyph Target Glyph   Type Ref Comment
0643 ك 06A9 ک allocatable   The two have identical shapes in initial and medial positions and are used by different language communities to refer to the same letter
0643 ك 06AA ڪ allocatable   The two have similar (interchangeable) shapes in initial and medial positions and are used by different language communities to refer to the same letter
06A9 ک 06AA ڪ allocatable   The two have similar (interchangeable) shapes in initial and medial positions and are used by different language communities to refer to the same letter

Variant Set 9 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
0646 ن 06BA ں allocatable   Used interchangeably in Africa for languages using Western (African) orthography

Variant Set 10 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
0679 ٹ 06BB ڻ blocked    

Variant Set 11 — 4 Members

Source Glyph Target Glyph   Type Ref Comment
067E پ 06BD ڽ blocked    
067E پ 06D1 ۑ blocked    
067E پ 0752 ݒ blocked    
06BD ڽ 06D1 ۑ blocked    
06BD ڽ 0752 ݒ blocked    
06D1 ۑ 0752 ݒ blocked    

Variant Set 12 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
0683 ڃ 0684 ڄ blocked    

Variant Set 13 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
068E ڎ 068F ڏ blocked    

Variant Set 14 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
06A4 ڤ 06A8 ڨ blocked    

Variant Set 15 — 4 Members

Source Glyph Target Glyph   Type Ref Comment
06AB ګ 06AD ڭ blocked    
06AB ګ 06AF گ blocked   U+06AB (ګ) KAF WITH RING interchangeably used in Pashto with U+06AF (گ) GAF
06AB ګ 0763 ݣ blocked    
06AD ڭ 06AF گ blocked    
06AD ڭ 0763 ݣ blocked    
06AF گ 0763 ݣ blocked   Iraqi Arabic uses U+06AF (گ) GAF, whereas Moroccan Arabic uses U+0763 (ݣ) KEHEH WITH THREE DOTS ABOVE

Variant Set 16 — 2 Members

Source Glyph Target Glyph   Type Ref Comment
0767 ݧ 08A9 blocked    

4 Classes, Rules and Actions

4.1 Character Classes

The following table lists all named and implicit classes with their definition and a list of their members intersected with the current repertoire (for larger classes, this list is elided).

Name Definition Count Members or Ranges Ref Comment
implicit Tag=sc:Arab 128 {0620-063A 0641-064A 0672 0679-0681 0683-068F 0691 0693 0695-069A 069F-06A0 06A2 06A4 06A6-06AB 06AD-06B1 06B3 06B5 06BA-06BE 06C0-06C4 06C6 06CB-06D2 06D5 ...}   Any character tagged as Arabic

Legend

Members or Ranges
Lists the members of the class as code points (xxx) or as ranges of code points (xxx-yyy). Any class too numerous to list in full is elided with "...".
Tag=ttt
A named or implicit class defined by all code points that share the given tag value (ttt).
Implicit
An anonymous class implicitly defined based on tag value.

4.2 Whole label evaluation and context rules

The following table lists all named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point or variant.

Name Regular Expression Used as
Trigger
Anchor Used as
Context
Ref Comment
leading-combining-mark (start)[∅=[[∅=\p{gc=Mn}]∪[∅=\p{gc=Mc}]]]       Default WLE rule matching labels with leading combining marks ⍟
no-mix-kaf-keheh (\u0643.*\u06A9)|(\u06A9.*\u0643)     [100] WLE Rule 1: do not mix Arabic letters KAF and KEHEH in the same label
no-mix-kaf-swash (\u0643.*\u06AA)|(\u06AA.*\u0643)     [100] WLE Rule 2: do not mix Arabic letters KAF and SWASH KAF in the same label
no-mix-alef-maksura-farsi-yeh (\u0649.*\u06CC)|(\u06CC.*\u0649)     [100] WLE Rule 3: do not mix Arabic letters ALEF MAKSURA and FARSI YEH in the same label
no-mix-heh-goal (\u0647.*\u06C1)|(\u06C1.*\u0647)     [100] WLE Rule 4: do not mix Arabic letters HEH and HEH GOAL in the same label
no-mix-heh-goal-ae (\u06C1.*\u06D5)|(\u06D5.*\u06C1)     [100] WLE Rule 5: do not mix Arabic letters HEH GOAL and AE in the same label
no-mix-heh-ae (\u0647.*\u06D5)|(\u06D5.*\u0647)     [100] WLE Rule 6: do not mix Arabic letters HEH and AE in the same label
no-mix-heh-doachashmee (\u0647.*\u06BE)|(\u06BE.*\u0647)     [100] WLE Rule 7: do not mix Arabic letters HEH and HEH DOACHASHMEE in the same label
no-mix-teh-marbuta-goal (\u0629.*\u06C3)|(\u06C3.*\u0629)     [100] WLE Rule 8: do not mix Arabic letters TEH MARBUTA and FEH WITH DOT MOVED BELOW in the same label
no-mix-noon-with-three-dots-above-yeh-with-three-dots-below (\u06BD.*\u06D1)|(\u06D1.*\u06BD)     [100] WLE Rule 9: do not mix Arabic letters NOON WITH THREE DOTS ABOVE and YEH WITH THREE DOTS BELOW in the same label
no-mix-peh-noon-with-three-dots-above (\u067E.*\u06BD)|(\u06BD.*\u067E)     [100] WLE Rule 10: do not mix Arabic letters PEH and NOON WITH THREE DOTS ABOVE in the same label
no-mix-feh-with-dot-moved-below (\u0641.*\u06A2)|(\u06A2.*\u0641)     [100] WLE 11:do not mix Arabic letters FEH and FEH WITH DOT MOVED BELOW in the same label
no-mix-qaf-with-dot-above (\u0642.*\u06A7)|(\u06A7.*\u0642)     [100] WLE Rule 12: do not mix Arabic letters QAF and QAF WITH DOT ABOVE in the same label
no-mix-feh-qaf-with-dot-above (\u0641.*\u06A7)|(\u06A7.*\u0641)     [100] WLE Rule 13: do not mix Arabic letters FEH and QAF WITH DOT ABOVE in the same label
no-mix-kaf-with-ring-gaf (\u06AB.*\u06AF)|(\u06AF.*\u06AB)     [100] WLE Rule 14: do not mix Arabic letters KAF WITH RING and GAF in the same label
no-mix-kaf-with-ring-keheh-with-three-dots-above (\u06AB.*\u0763)|(\u0763.*\u06AB)     [100] WLE Rule 15: do not mix Arabic letters KAF WITH RING and KEHEH WITH THREE DOTS ABOVE
no-mix-gaf-keheh-with-three-dots-above (\u06AF.*\u0763)|(\u0763.*\u06AF)     [100] WLE Rule 16: do not mix Arabic letters GAF and KEHEH WITH THREE DOTS ABOVE in the same label

Legend

Used as Trigger
This rule triggers one of the actions listed below.
Used as Context
This rule defines a required or prohibited context for a code point C or variant V.
Anchor
This rule has a placeholder for the code point for which it is evaluated.
Regular Expression
A regular expression equivalent to the rule, shown in a modified notation as noted:
(... | ...) - choice
When there is more than one alternative in a rule, the choices are separated by the alternation operator (...|...).
start or end
(start) matches the start of the label; (end) matches the end of the label.
. - any code point
. matches any code point.
*, +, ?, {n,m} - count operators
* indicates 0 or more, + indicates one or more, and ? indicates up to one instance. {n,m} indicates at least n and at most m instances.
[\p{ }] - property character set
Set of all characters matching a given value for a Unicode property [\p{prop=val}]. Note: uppercase "\P" defines the complement of a property set.
∪, ∩, ∖, ∆ - set operators
Sets may be combined by set operators ( = union, = intersection, = difference, = symmetric difference).
∅= - empty set
Indicates that the following set is empty because of the result of set operations, or because none of its elements is part of the repertoire defined here. A rule with a non-optional empty set never matches.
⍟ - default rule
Rules marked with ⍟ are included by default and may or may not be triggered by any possible label under this LGR.

4.3 Actions

The following table lists the actions that are used to assign dispositions to labels and variant labels based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.

# Condition Rule / Variant Set   Disposition Ref Comment
1 if label matches leading-combining-mark invalid   labels with leading combining marks are invalid ⍟
2 if at least one variant is in {out-of-repertoire-var} invalid   any variant label with a code point out of repertoire is invalid ⍟
3 if label matches no-mix-kaf-keheh invalid   do not mix Arabic letters KAF and KEHEH in the same label
4 if label matches no-mix-kaf-swash invalid   do not mix Arabic letters KAF and SWASH KAF in the same label
5 if label matches no-mix-alef-maksura-farsi-yeh invalid   do not mix Arabic letters ALEF MAKSURA and FARSI YEH in the same label
6 if label matches no-mix-heh-goal invalid   do not mix Arabic letters HEH and HEH GOAL in the same label
7 if label matches no-mix-heh-goal-ae invalid   do not mix Arabic letters HEH GOAL and AE in the same label
8 if label matches no-mix-heh-ae invalid   do not mix Arabic letters HEH and AE in the same label
9 if label matches no-mix-heh-doachashmee invalid   do not mix Arabic letters HEH and HEH DOACHASHMEE in the same label
10 if label matches no-mix-teh-marbuta-goal invalid   do not mix Arabic letters TEH MARBUTA and FEH WITH DOT MOVED BELOW in the same label
11 if label matches no-mix-noon-with-three-dots-above-yeh-with-three-dots-below invalid   do not mix Arabic letters NOON WITH THREE DOTS ABOVE and YEH WITH THREE DOTS BELOW in the same label
12 if label matches no-mix-peh-noon-with-three-dots-above invalid   do not mix Arabic letters PEH and NOON WITH THREE DOTS ABOVE in the same label
13 if label matches no-mix-feh-with-dot-moved-below invalid   do not mix Arabic letters FEH and FEH WITH DOT MOVED BELOW in the same label
14 if label matches no-mix-qaf-with-dot-above invalid   do not mix Arabic letters QAF and QAF WITH DOT ABOVE in the same label
15 if label matches no-mix-feh-qaf-with-dot-above invalid   do not mix Arabic letters FEH and QAF WITH DOT ABOVE in the same label
16 if label matches no-mix-kaf-with-ring-gaf invalid   do not mix Arabic letters KAF WITH RING and GAF in the same label
17 if label matches no-mix-kaf-with-ring-keheh-with-three-dots-above invalid   do not mix Arabic letters KAF WITH RING and KEHEH WITH THREE DOTS ABOVE
18 if label matches no-mix-gaf-keheh-with-three-dots-above invalid   do not mix Arabic letters GAF and KEHEH WITH THREE DOTS ABOVE in the same label
19 if at least one variant is in {blocked} blocked   any variant label containing blocked variants is blocked ⍟
20 if each variant is in {allocatable} allocatable   variant labels with all variants allocatable are allocatable ⍟
21 if any label (catch-all)   valid   catch all (default action) ⍟

Legend

{...} - variant type set
In the "Rule/Variant Set" column, the notation {...} means a set of variant types.
⍟ - default action
Actions marked with ⍟ are included by default and may or may not be triggered by any possible label under this LGR.

5 Table of References

[0] The Unicode Standard 1.1
Any code point originally encoded in Unicode 1.1
[3] The Unicode Standard 3.0
Any code point originally encoded in Unicode 3.0
[7] The Unicode Standard 4.1
Any code point originally encoded in Unicode 4.1
[9] The Unicode Standard 5.1
Any code point originally encoded in Unicode 5.1
[11] The Unicode Standard 6.0
Any code point originally encoded in Unicode 6.0
[12] The Unicode Standard 6.1
[100] RFC 5564 Linguistic Guidelines for the Use of the Arabic Language in Internet Domains
https://tools.ietf.org/html/rfc5564
[101] Omniglot Hausa
http://omniglot.com/writing/hausa.htm
[102] Omniglot Kashmiri
http://omniglot.com/writing/kashmiri.htm
[103] Omniglot Kazakh
http://omniglot.com/writing/kazakh.htm
[104] Omniglot Khowar
http://omniglot.com/writing/khowar.htm
[105] Omniglot Kirghiz
http://omniglot.com/writing/kirghiz.htm
[106] Omniglot Kurdish
http://omniglot.com/writing/kurdish.htm
[107] Omniglot Malay
http://omniglot.com/writing/malay.htm
[108] Omniglot Pashto
http://omniglot.com/writing/pashto.htm
[109] Omniglot Persian(Farsi)
http://omniglot.com/writing/persian.htm
[110] Omniglot Saraiki
http://omniglot.com/writing/saraiki.htm
[111] Omniglot Sindhi
http://omniglot.com/writing/sindhi.htm
[112] Omniglot Urdu
http://omniglot.com/writing/urdu.htm
[113] Omniglot Wolof
http://omniglot.com/writing/wolof.htm
[114] Omniglot Uyghur
http://omniglot.com/writing/Uyghur.htm
[115] Unicode, Kashmiri, Yeh
http://www.unicode.org/L2/L2009/09215-kashmiri.pdf
[116] Unicode, Chad ANT, pp. 19-20
http://www.unicode.org/L2/L2010/10288r-arabic-proposal.pdf
[117] Unicode, DPLN, p.21
http://www.unicode.org/L2/L2010/10288r-arabic-proposal.pdf
[118] Unicode, Jawi and Moroccan Arabic GAF,
http://www.unicode.org/L2/L2003/03176-gafs.pdf
[119] Unicode, Chadian, p.5
http://www.unicode.org/L2/L2010/10288r-arabic-proposal.pdf
[120] Wolof, Paul Timothy
http://paul-timothy.net/pages/ajamisenegal/primers/je_sais_le_wolofal_harmattan_20-oct-2015_a4.pdf
[121] Hausa, pp. 261-289 Warren-Rothlin, Andy (2014): West African scripts and Arabic-script orthographies in socio-political context. Meikal Mumin, Kees (C.) H. Versteegh (Eds.): The Arabic script in Africa. Studies in the use of a writing system. Leiden, Boston: Brill (Studies in Semitic Languages and Linguistics, 71)
[122] Mandika, Bamana, pp. 225-260 Vydrin, Valentin Feodos'evich; Dumestre, Gérard (2014): Manding Ajami samples. In Meikal Mumin, Kees (C.) H. Versteegh (Eds.): The Arabic script in Africa. Studies in the use of a writing system. Leiden, Boston: Brill (Studies in Semitic Languages and Linguistics, 71)
[123] Ethiopian, Wetter, Andreas (2006): Arabic in Ethiopia. In Kees (C.) H. Versteegh (Ed.): Encyclopedia of Arabic Language and Linguistics. Volume I. A-Ed, vol. 2. With assistance of Mushira Eid, Alaa Elgibali, Manfred Woidich, Andrzej Zaborski. Leiden: E. J. Brill, pp. 51-56.
[124] Western Arabic, Qaf with three dots above, city of Gabes, Tunisia
[125] Urdu, Heh goal with hamza above, Section 3 in
http://www.columbia.edu/~mk2580/urdu_section/handouts/izafat.pdf
[126] Urdu, Teh marbuta goal, Code point UZT 76 of Urdu Zabta Takhti 1.01, the official code page standard for Govt. of Pakistan, approved in 2001; see
http://cle.org.pk/Publication/papers/2001/uzt1.01.pdf
[127] Kurdish-Sorani, p.7 in
http://www.fas.harvard.edu/~iranian/Sorani/sorani_1_grammar.pdf
[128] Wolof, Beh with dot below and three dots above, "Wolofal Orthography" by Galen Currah, revised 20 May 2011
[129] Malay, Jawi Keyboard standard by Department of Standards Malaysia
https://en.wikipedia.org/wiki/Jawi_keyboard
(Accessed on 13 November 2015)
[130] Ajami usage, "Language planning in West Africa - who writes the script?" by Friederike Lüpke;
http://www.elpublishing.org/docs/1/02/ldd02_08.pdf
[131] Fulfulde,
http://www.silcam.org/documents/AlphabetandOrthographyStatementforFulfuldeFUBAjamiyafortheinternet.pdf
[132] Hausa, Newspapers and books published in Hausa using Arabic script,
http://aflang.humnet.ucla.edu/Hausa/Pronunciation/writing.html
[133] Kyrgyz,
http://www.ethnologue.com/language/kir
[134] Kyrgyz,
http://www.ethnologue.com/country/CN/languages
[135] Urdu,
http://www.bbc.com/urdu
[136] Wikipedia: "Kyrgyz alphabets",
http://en.wikipedia.org/wiki/Kazakh_alphabets
(accessed on 13 November 2015)
[137] Malay, Information technology - Jawi coded character set for information interchange MS 2443:2012, Department of Standards, Malaysia.
http://www.standardsmalaysia.gov.my
[138] Pashto Academy Peshawar University
[139] Saraiki,
https://id-id.facebook.com/jhoke.saraiki
[140] Kurdish,
http://www.kurdpress.com/
[141] Internet Architecture Board (IAB), "IAB Statement on Identifiers and Unicode 7.0.0"
https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/
Combining Hamza not recommended for use by IAB statement and combining marks not included by TF-AIDN; So combined form needs to be included
[142] An introduction to Latin-Script Uyghur: by Waris Abdukerim Janbaz ,State Library of Victoria, 2006,
http://docplayer.net/42224797-An-introduction-to-latin-script-uyghur.html
[143] Torwali online and printed dictionaries
[144] Wikipedia: "Jawi keyboard"
https://en.wikipedia.org/wiki/Jawi_keyboard
(Accessed on 13 November 2015)
[145] Wikipedia: "Saraiki alphabet"
http://en.wikipedia.org/wiki/Saraiki_alphabet
(Accessed on 13 November 2015)
[146] ANT (Alphabet National du Tchad) is the national standard for Chad/Tchad; See Figures in L2/10-288R (used for “tr” sound as given in the table in Section 6.1.2):
http://www.unicode.org/L2/L2010/10288r-arabic-proposal.pdf
and Appendix B
[147] Wolof,
http://www.openbookpublishers.com/htmlreader/978-1-78374-062-8/11.Ngom.xhtml#_idTextAnchor144