IDN Character Validation Guidance
The IDNA protocol standard is currently under the last step of review and revision in the IETF. An informal expert panel, working as what the IETF calls a "design team," evaluated experiences gained in the implementation of IDNA since its introduction in 2003, and identified several key areas of future work. These were described in particular in RFC4690 that triggered a formal revision of the IDNA protocol. The core components in the revision effort include: definition of valid IDN labels, an inclusion-based model that recognizes the level of understanding of the implications of the Unicode handling of various scripts on use in IDNs (the current model is exclusion-based), elimination of confusing and non-reversible character mappings, fixing a right-to-left error in Stringprep, and eliminating Unicode version dependencies, thereby permitting more scripts to be used in IDNs now and in the future. The issues with the current IDN model that led to the revision work are discussed in RFC4690.
Latest version of the IDNA revision proposals are available through the IETF or at Patrik Fältström's site: http://stupid.domain.name/idnabis/
One of the core principles in the revision is a procedure, and not a table, of code points so that the algorithm can be used to determine code point sets independent of the version of Unicode that is in use. However, for guidance to the IDN ccTLD Fast Track participants, and until the revised version of the IDNA protocol has been implemented, the following tables of IDNA valid characters, resulting from running the protocol procedure on Unicode 5.2 is released:
- Characters that are valid under both IDNA2003 and IDNA2008 [TXT, 3.39 MB]
- Characters that are valid under IDNA2003 but not under IDNA2008 [TXT, 164 KB]
- Characters that are valid under IDNA2008 but not under IDNA2003 [TXT, 4 KB]
Note: What is not included in category (3) are the codepoints that where unassigned in Unicode 3.2 and PVALID in IDNA2008.
Warning Notes: The content of these tables is only a verification made against the idnabis-table document. The tables do not include any bidi verification. Further, no confusability checking has been between codepoints in these tables, and as such the tables constitute only a basic string validation. This means that if a string contains codepoints that are not are not valid in accordance with these tables, further manual checks can be done, but it most likely implies a show-stopper for usage of the implied string. In addition, if a codepoint is valid per the tables, further manual checks must also be done to ensure that the entire string constitutes no stability issues for the DNS.