Skip to main content
Resources

Alternate Path to Delegation Report for .axa

Eligibility for Alternate Path to Delegation

TLD "axa" is eligible for the Alternate Path to Delegation as described in the ICANN New gTLD Collision Occurrence Management plan. [1]

Second Level Domains (SLDs)

A total of 3357 unique applicable SLDs were detected in the eight DNS-OARC "Day In The Life" ("DITL") datasets [2] collected in 2006-2013, and the 2010 DNSSEC rollout datasets; hereafter "input datasets." Pursuant to the ICANN New gTLD Collision Occurrence Management plan, if the Registry Operator chooses to block [3] all of these strings, its proposed TLD may proceed to delegation in advance of the forthcoming Collision Occurrence Assessment.

The list of SLD Strings that must be blocked is available here.

Strings appearing in the input datasets that are not valid hostnames as defined in RFC 1123 ("LDH Rule") and are not valid A-labels as defined in RFC 5890 were not included in the block list. Furthermore, the contractually required SLD "nic" will not appear in the block list.

The block list was determined as follows:
  1. List all unique strings at the SLD position in DNS requests where the applied-for string is in the TLD position in all input datasets;
  2. Filter the SLD query position as described above;
  3. Remove the "Chrome 10" strings at the leftmost query position on a best effort basis (see Methodology section below);

The remaining SLD strings comprise the block list.

Methodology

Data and Source Code

DNS-OARC data was re-processed from the raw PCAP [4] files provided by participating DNS Root Server Operators as a part of the "Day In The Life," and 2010 DNSSEC rollout data collection programs. Source code and procedures to process the raw files are available on GitHub. [5] Because processing these files is resource intensive, DNS-OARC members are invited to utilize the intermediary files located here (/home/kwhite/jas/gtld/jas) [6] for their own analysis and research.

SLD Strings Excluded from the Block List

A significant proportion of the queries appear to be randomly generated 10 character alphabetic [7] strings used by the Google Chrome browser to detect certain aspects of DNS resolver behavior. [8] While there appear to be numerous varieties and sources of random/algorithmically-generated strings in the input datasets, the 10 character Chrome queries appear to present minimal risk if filtered from the block lists.

The "Chrome 10" strings come in triplets as described in the Chromium source code. Only strings that are seen coming in the triplet sequence from the same IP are eliminated.

While "randomness" is relatively easy for humans to detect, it is remarkably hard for machines. However, since the datasets are so dominated by these labels - for which blocking adds no practical value - significant effort to detect and exclude these has been taken.

We engaged expert data scientists to develop a robust mechanism to detect these random strings. The following is a high level description of the algorithm they developed.

Only 10 character labels consistent with the format described in the Chromium source code are subjected to "random detection." [9]

Parameters were selected and algorithms were tuned with an English dictionary. [10] "Randomness" of each label is determined only after analyzing the entire dataset and performing a statistical analysis of the labels and multiple substrings depending on the individual characteristics of the label.

To validate and tune the algorithm, we ran 84 individual experiments for a total of 851 CPU hours on the DITL 2013 and 2012 datasets. The quality of the random recognizer was confirmed with the following tests:

1) Test #1: An English dictionary was used to count the number of false positives detected. The ratio of "RANDOM_YES predictions that hold an English word" to "the total # of RANDOM_YES predictions" was calculated. This test verifies that a RANDOM_YES will not have English words embedded in it. Manual inspection of borderline strings very often reveals English words like "host," "mail," and "server" embedded in strings, so this test verified the random recognizer's performance in those situations. Less than 0.2% error rate was observed following manual inspection.

2) Test#2: Results of the random detector were compared to a simplistic detection of the Chrome random strings. Less than 0.8% error rate was observed following manual inspection.


[1] http://www.icann.org/en/groups/board/documents/resolutions-new-gtld-annex-1-07oct13-en.pdf

[2] https://www.dns-oarc.net/oarc/data/ditl

[3] http://www.icann.org/en/topics/idn/idn-vip-integrated-issues-final-clean-20feb12-en.pdf, Section 5.

[4] PCAP is a binary network packet capture file format

[5] https://github.com/JASdevteam/dns-oarc

[6] DNS-OARC may move these intermediary files to a permanent location at some future point.

[7] The length is hard-coded in Chromium source (IntranetRedirectDetector::kNumCharsInHostnames = 10) here

[8] See comments in source code here

[9] Chromium Source

[10] Of course there are non-English strings in the labels, but many are English or English-derived strings like "proxy" and "host" that are common internationally. An English dictionary was a good validation tool, but in the end "randomness" is not determined by existence in an English dictionary.

Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."