Skip to main content
Resources

Frequently Asked Questions: ICANN’s Domain Abuse Activity Reporting (DAAR) Project

ICANN's Domain Abuse Activity Reporting (DAAR) project is a system for studying and reporting on domain name registration and security threat (domain abuse) behavior across top-level domain (TLD) registries and registrars. The overarching purpose of DAAR is to report security threat activity to the ICANN community, which can then use the data to facilitate informed policy decisions. DAAR was designed to provide the ICANN community with a reliable, persistent, and reproducible set of data from which security threat (abuse) analyses could be performed. The system collects TLD zone data, a very large body of registration data, and complements these data sets with a large set of high-confidence reputation (security threat) data feeds. The data collected by the DAAR system can serve as a platform for studying or reporting daily or historical registration or abuse activity.

What is the Domain Abuse Activity Reporting (DAAR) Project?
Why did ICANN's Office of the CTO (OCTO) develop DAAR?
What is the purpose of DAAR?
What types of security threats does DAAR observe?
How does DAAR compile threat data?
What reputation data does DAAR use?
How reliable is DAAR's data?
How up-to-date is the data?
Does DAAR find all instances of the abuse in the DNS?
Does DAAR list only domains that were registered by malicious parties?
How does DAAR deal with false positives in reputation data?
Who will have access to DAAR?
What is the relationship between DAAR and the Open Data Initiative?
What role will DAAR play in ICANN policy?
How does DAAR fall into ICANN's remit?
List of Reputation Data Providers and data feeds

What is the Domain Abuse Activity Reporting (DAAR) Project?

DAAR is a platform for studying domain name registration and abuse behavior across top-level domain (TLD) registries and registrars. The system has two major components:

  • Collection system – Gathers the zone files of every TLD for which we are able to obtain data, compiles domain abuse data from independent security threat-reporting sources, obtains registration data for the domains, and associates security threat activity to individual TLDs or ICANN accredited registrars.
  • Graphical user interface (GUI) administration system – Provides tabular and graphical visualizations of domain registration and abuse activities, including the display of historical data. The GUI allows ICANN staff administrators to study registration or abuse activities and to export data for report generation.

Combined, the two systems give views of one or more days in the life of domain registration services – including the abuse of registered domain names.

Why did ICANN's Office of the CTO (OCTO) develop DAAR?

Efforts to study domain name abuse are relatively common today, but they often have limitations:

  • Few efforts study abuse across all gTLD delegations.
  • Studies do not typically use large numbers of reputation data sets.
  • Studies generally do not assess multiple security threats.
  • Few efforts warehouse enough data over time to provide a basis for conducting historical data analyses.
  • Many efforts studying domain name abuse are done in the context of products or services, which can lead to biases on how the security threat-related data are collected and displayed.
  • Perhaps most importantly, the methodologies and data sources for these studies are often not disclosed in enough detail, so study results cannot always be reproduced.

After many informal requests from the community, OCTO concluded that the ICANN community would benefit from having a neutral, unbiased, persistent, and reproducible set of data from which analyses could be performed. OCTO began a research project to develop a system to collect a very large body of registration data, complemented by a large set of high-confidence reputation data feeds. The idea was that the data collected by this system could serve as a platform for studying daily or historical registration or abuse activities and for reporting activity.

What is the purpose of DAAR?

The overarching purpose of DAAR is to report security threat activity to the ICANN community, which can use the data to make informed decisions. Within this broad framework, DAAR has many specific goals:

  • Facilitate registry, registrar, or industry investigation of malicious domain name registrations.
  • Assist in domain name anti-abuse efforts.
  • Allow determination of and reporting on the presence or prevalence of security threats at a registry or accredited registrar.
  • Isolate or assist in identifying causes of abnormal registration activities.
  • Track security threat activities at a registry or registrar level over time.
  • Track domain registration activities (adds, deletes) over time.
  • Corroborate claims of registrant, registrar, and/or registry contractual violations presented to the ICANN organization.
  • Support the ICANN community's consumer confidence and trust activities.
  • Assist the ICANN organization's Contractual Compliance department in obtaining additional information relating to a complaint filed against an accredited registrar or TLD registry operator (if requested).

What types of security threats does DAAR observe?

DAAR identifies and tracks domain names associated with four kinds of abuse:

  • Phishing. Domain names that support web pages that masquerade as a trustworthy entity such as a bank or online merchant.
  • Malware. Domain names that facilitate the hosting and/or spreading of hostile or intrusive software that is installed on end systems, potentially without the permission of the user.
  • Botnet command-and-control. Domain names that are used to identify hosts that control botnets, which are collections of malware-infected computers that can be used to perpetrate various abusive activities such as denial of service, spam, etc.
  • Spam. Domains that are advertised in unsolicited bulk email or used to name spam mail exchange systems. The term spam no longer describes only unsolicited bulk email but has become a major means of delivery for identifiers (domain names, hyperlinks, or addresses) used to support the above-listed security threats.

How does DAAR compile threat data?

DAAR does not work in isolation. The system does not generate threat data. It relies on open or commercial reputation data to identify and classify the four types of abuse mentioned above. The reputation feed providers of the data DAAR uses meet several criteria: accuracy, coverage, industry adoption, and the feed's ability to classify events into the abuse classes that DAAR tracks.

If a domain is listed for two or more types of abuse, that domain will be counted in each relevant abuse category. However, only unique domains are counted for the abuse domain total in the TLD or registrar portfolio, and for scoring purposes.

What reputation data does DAAR use?

We believe that it is beneficial to collect the same security threat (abuse) data that is reported to industry and Internet users. Security systems such as anti-spam or anti-malware gateways or firewalls that protect billions of users incorporate these data into their threat mitigation measures. DAAR reports thus reflect how the user and network operations communities see the domain name ecosystem through the lens of threat data.

DAAR incorporates a large number of reputation feeds; see the list of feeds and providers at the end of this Q&A for the feeds in use as of this writing. Collectively, these feeds give multiple sources for the security threats that DAAR can measure or analyze. DAAR is designed to be extensible – to ensure quality data, or to assess security threats that the ICANN community may identify in the future, data feeds from reputation providers may be added or removed over time.

How reliable is DAAR's data?

DAAR uses three categories of data: zone data, registration data, and reputation data.

DAAR collects TLD zone data daily, using ICANN's Centralized Zone Data Service and/or provided directly through agreements with TLD operators. Zone data, the availability of which is contractually mandated for generic top-level domains (gTLDs) and volunteered by country code top-level domains (ccTLDs), are generally provided once a day. As such, DAAR will not observe zone changes after the daily release of zone data until the following day.

Domain name registration data come from the WHOIS service provided by registries and registrars. The DAAR system does not verify the validity or accuracy of registration data.

DAAR collects reputation data from providers that the OCTO team has selected based on the providers' industry reputations for accuracy (defined here as having near-consensus adoption across the operational security community). The providers must have clearly defined processes for adding and removing identified domain names from their feeds. Another selection criteria is the prevalence of use of the feed – by academia in research papers and theses and by industry in products and services. Finally, the feed must support at least one of the four abuse classifications that DAAR tracks.

How up-to-date is the data?

The data used by DAAR is updated each day. The domain counts are collected each day from fresh TLD zone files. Some registry operators only grant zone file access for limited periods before requiring renewal, and this creates occasional gaps. The reputation providers continually add domains to their lists. The system collects these updated data from each provider several times per day. Each provider also has a procedure for removing domains from their lists, and these removals are tracked and accounted for within DAAR. Generally, each provider lists a domain name or URL for as long as the provider believes the domain constitutes a problem, after which the domain is removed. A domain name may be listed only for minutes, or for months, depending upon the provider's policies and criteria.

There are a few lists that do not track abuse status and therefore do not provide "removal" flags. One example is the Anti-Phishing Working Group (APWG) phishing URL feed. This is a list of newly confirmed phishing URLs, but the APWG does not then track all of those URLS to see which sites are up and which are down. To be conservative, when a domain is listed in the APWG feed, we only count that domain as "listed" or "active" for one day.

Does DAAR find all instances of the abuse in the DNS?

No. The DAAR system collects abuse data from multiple reputation service providers however these providers do not (and do not claim to) see or list all instances of abuse that happen on the Internet. We therefore note that DAAR provides a baseline measurement, and that the amount of abuse associated with domain names is larger than what this system catalogues. Users of DAAR data should assume that the statistics it presents are a subset of the abuse problem in a given TLD, or in the TLD portfolio of a given registrar.

Does DAAR list only domains that were registered by malicious parties?

No. In general, reputation providers cannot definitively attribute motives to actors registering domain names. Likewise, DAAR is not likely to know the motives for registering domain names. DAAR relies on reputation service providers that use modern abuse detection. Some provider's lists may also contain domain names for which the hosting service has been compromised, resulting in the domain being used for malicious purposes. As both pose security threats to Internet users, they are deemed relevant for DAAR's purposes.

How does DAAR deal with false positives in reputation data?

Based on independent review, false positive rates are low among the lists we have chosen. DAAR does not modify the data received from reputation feeds, so if the feeds include false positives, those false positives are reflected in DAAR's output.  However, since numerous parties rely on these reputation feeds – for example, email service providers, Internet service providers, and resolver operators – any false positives will affect the domain name ecosystem regardless of how DAAR reports them. As such, efforts within the DAAR system to further reduce false positives would result in conflicting or false information from the perspective of impact of reported abuse to those parties.

Who will have access to DAAR?

Only ICANN staff and contracted developers can access DAAR directly through its administrative interface. OCTO will be working with the ICANN community to determine the best way to share the statistics and analyses derived from data that DAAR collects.

What is the relationship between DAAR and the Open Data Initiative?

The Open Data Initiative is an umbrella term for efforts aimed at making it easy for anyone to access data that the ICANN organization or community creates or curates. DAAR uses data from public, open, and/or commercial sources. DNS zone data and WHOIS registration data are publicly available. Certain reputation data sources are open source, whereas others are commercial feeds requiring a license or subscription. For some commercial feeds, licensing permits derivative but not direct use. In cases where there are no limitations on redistribution of DAAR-related data, these data and analyses will be published periodically and included in the Open Data Initiative.

What role will DAAR play in ICANN policy?

The purpose of DAAR is to provide verifiable and reproducible data to facilitate analyses that could be useful in making informed consensus policy decisions. DAAR assembles a composite of the domain name reputation data that the operational security community observes, reports, and uses. It is up to the ICANN community to determine whether or how to use the reports derived from DAAR-collected data in policy deliberations.

How does DAAR fall into ICANN's remit?

For ICANN to help ensure the security and stability of the top-level of the Internet's system of unique identifiers that it coordinates according to its mission, both the ICANN organization and community must be aware of threats to that system. In keeping with ICANN's requirements for openness and transparency, the organization must, as much as is possible, make the data we collect available to the community. Finally, the role of the ICANN organization in general, and the Office of the CTO in particular, is to provide neutral, unbiased data and analyses to facilitate policy discussions and development.

List of Reputation Data Providers and data feeds

As of July 2017, DAAR incorporates the following blocklists.

  • Spamhaus Domain Blocklist (DBL). Domains advertised in spam, domains used for phishing, domains used to support malware, and a list of botnet Command and Control (C&C) domains.
  • SURBL. Domains advertised in spam, domains used for phishing, and domains used to support malware.
  • Anti-Phishing Working Group. URL blocklist feed: domains used for phishing.
  • Phishtank. Domains used for phishing.
  • Malware Patrol: Domains used to support malware. In addition, Malware Patrol's feed incorporates listings from these malicious domain blocklists:
    • SpamAssassin
    • Carbon Black Malicious Domains
    • Postfix Mail Transfer Agent (MTA)
    • Squid Web Proxy
    • Smoothwall
    • Symantec Email Security for SMTP
    • Symantec Web Security
    • Firekeeper
    • DansGuardian
    • Ransomware botnet C&C server IPs
    • Mailwasher
    • Mozilla Firefox Adblock
  • Ransomware Tracker. Malware botnet C&C servers.
  • Feodotracker. Domains used to support malware.
Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."