Skip to main content

WHOIS Accuracy Reporting System Update

In advance of the ICANN 53 meeting in Buenos Aires, I thought it might be helpful to provide an update about one project, the WHOIS Accuracy Reporting System (ARS), which will also be the subject of a session at the ICANN 53 meeting.

As a quick reminder, the ARS project came to life out of the recommendations from the 2012 WHOIS Policy Review Team Report. Based on these recommendations, the ICANN Board established the ARS project in November 2012, specifically to:

  1. Identify potentially inaccurate gTLD registration information;
  2. Explore using automated tools;
  3. Forward potentially inaccurate records to registrars for action; and,
  4. Publicly report on the resulting actions to encourage improved accuracy.

Following this direction, ICANN initiated a Pilot Project to test the proof of concept of collecting, analyzing and reporting accuracy statistics about a set of gTLD WHOIS data. The Pilot Report was published in December 2014 and the report of public comments was published in April of this year. Thank you to those in the community who participated in the public comment forum or the "All Things WHOIS" Session at ICANN 52 in Singapore.  The community’s feedback and the operational lessons learned from the Pilot are being applied to the next stages of the ARS project.

We are currently in the first of three "Phases"—Phase 1 will assess syntactical accuracy, Phase 2 will add operational accuracy assessments, and Phase 3, should the community decide ICANN should pursue it, will attempt to assess the accuracy of the identity of the contacts in the record.

In Phase 1, which has been underway since early April 2015, we are looking at the syntax of the contact information required in a WHOIS record.  That is, the email addresses, telephone numbers and postal addresses of the Registrant, Technical (Tech) Contact and Administrative (Admin) Contact provided for WHOIS as part of a domain name registration on a gTLD. You can think of syntax as referring to the format of each of these components: e.g., Does the email have an "@" symbol? Does the telephone number have a country code? Does the postal code have the proper number of digits or characters for the specified country? To determine syntactic accuracy, we have worked with Contractual Compliance and other departments within ICANN to develop a series of criteria (or tests) that will be applied to each WHOIS record to determine if the contact information is syntactically accurate. These criteria are aligned with the applicable version of the Registrar Accreditation Agreement (RAA) for the record, as recommended by the community in the public comment. You can review these tests here: Whois ARS Phase 1 Validation Criteria [PDF, 210 KB].

As of today, we have collected the data, established scoring criteria, and are preparing to perform the accuracy testing.  Following the conclusion of the testing, we will perform analysis to understand the data and work to present it in a meaningful way when we publish a final report on Phase 1 in August. To aid us in the work of accuracy testing and data analysis, we have engaged a team of capable vendors, including Digicert, the Universal Postal Union, and the NORC. You can expect the final report to provide:

  1. Accuracy rates by various segments of the gTLD population: New gTLDs, Prior gTLDs,  2009 RAA records, 2013 RAA records
  2. The most common errors causing records to be inaccurate in these segments and any other themes or trends observed; and,
  3. Where relevant, regional statistics on accuracy will be included.

Following this report, the results of noncompliant records will be provided to ICANN Compliance for follow-up, and where applicable, ICANN Compliance will provide this information to registrars for investigation and, as necessary, corrective action.

In parallel with preparing the report for Phase 1, we will launch Phase 2 of the project.  Phase 2 will test "operational" criteria in addition to the syntax criteria tested in Phase 1.  Examples of operational criteria would be things like: Does the email not get bounced back to sender? Does the phone ring when the number has been dialed completely (and not before dialing completed)? For Phase 2 we are targeting producing a final report in December 2015.  From then on, we will work to produce two operational-type reports, one in June and one in December.  After each reporting cycle, data regarding any potentially inaccurate records will be provided to ICANN Contractual Compliance for further action, including follow-up with registrars for investigation and correction of inaccurate data.

For Phase 3, as was discussed previously in the report of Public Comments from the Pilot, more community discussion and dialogue is required as to how identity verification could be done and whether it should be performed at all.

We look forward to sharing more details about Phase 1 & Phase 2 in Buenos Aires at the session, and we hope you will join us.

Comments

    Domain Name System
    Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."