Skip to main content
Resources

Frequently Asked Questions: ICANN’s Domain Abuse Activity Reporting (DAAR) Project

What is the Domain Abuse Activity Reporting (DAAR) Project?
Why did ICANN org develop DAAR?
What is the purpose of DAAR?
What types of security threats does DAAR observe?
How does DAAR compile threat data?
What reputation data does DAAR use?
How reliable is DAAR's data?
How up-to-date is the data?
Does DAAR find all instances of the abuse in the DNS?
Does DAAR list only domains that were registered by malicious parties?
How does DAAR deal with false positives in reputation data?
Who will have access to DAAR?
What is the relationship between DAAR and the Open Data Initiative?
What role will DAAR play in ICANN policy?
How does DAAR fall into ICANN's remit?
How can I provide input on DAAR?
List of Reputation Data Providers and data feeds

What is the Domain Abuse Activity Reporting (DAAR) Project?

DAAR is a platform for studying domain name registration and security threat (abuse) behavior across top-level domain (TLD) registries and registrars. The system has two major components:

  • Collection system – Gathers zone files of every TLD for which we are able to obtain data, compiles domain abuse data from independent security threat-reporting sources and associates security threat activity to individual TLDs.
  • Graphical user interface (GUI) administration system – Provides tabular and graphical visualizations of domain registration and abuse activities, including the display of historical data. The GUI allows ICANN staff administrators to study security threat activities and to export data for report generation.

Combined, the two systems provide views of one or more days in the life of domain registration services – including the abuse of registered domain names.

Why did ICANN org develop DAAR?

Efforts to study domain name abuse are relatively common today, but they often have limitations:

  • Few efforts study abuse across all generic Top-Level Domains and over time.
  • Majority of work in this area focused on only specific type of security threat(s) and do not assess multiple security threats .
  • Perhaps most importantly, the exact methodologies and data sources used for these studies are often not disclosed to public or registries, so study results cannot always be reproduced.

After many informal requests from the community, ICANN's Office of the CTO (OCTO) concluded that the ICANN community would benefit from having a neutral, unbiased, persistent, and reproducible methodology set of anonymized data from which analyses could be performed. OCTO's Security, Stability and Resiliency (SSR) Team began a research project to develop a system to collect a very large body of domain name data, complemented by a large set of high-confidence reputation data feeds.

What is the purpose of DAAR?

The overarching purpose of DAAR is to report security threat activity to the ICANN community, which can use the data to make informed decisions. Within this broad framework, DAAR has many specific goals:

  • Track the security threat reputation of the TLDs based on well known threat reputational datasets over time.
  • Assist in domain name anti-abuse efforts by making the DAAR methodology public.
  • Allow determination of and reporting on the presence or prevalence of security threats at a registry.
  • Assist registries or registrars in identifying causes of abnormal registration activities.
  • Support the ICANN community's consumer confidence and trust activities.

What types of security threats does DAAR observe?

DAAR identifies and tracks reported domain names associated with four kinds of security threats:

  • Phishing. Domain names that support web pages that masquerade as a trustworthy entity such as a bank, known brand, online merchant or government agency.
  • Malware. Domain names that facilitate the hosting and/or spreading of hostile or intrusive software that is installed on end systems, potentially without the permission of the user.
  • Botnet command-and-control. Domain names that are used to identify hosts that control botnets, which are collections of malware-infected computers that can be used to perpetrate various abusive activities like lunching denial of service attacks, and send spam email or phishing campaigns, among others.
  • Spam. Domains that are advertised in unsolicited bulk email or used to name spam mail exchange systems. The term spam no longer describes only unsolicited bulk email but has become a major means of delivery for identifiers (domain names, hyperlinks, or addresses) used to support the above-listed security threats.

How does DAAR compile threat data?

DAAR does not work in isolation. The system does not generate threat data. It relies on open or commercial reputation data to identify and classify the four types of security threats mentioned above. The reputation feed providers of the data DAAR uses meet several criteria: accuracy, coverage, industry adoption, and the feed's ability to classify events into the security threat classes that DAAR tracks.

If a domain is listed for two or more types of threat, that domain will be counted in each relevant threat category. However, only unique domains are counted for the total security threat domains in the TLD or registrar portfolio, and for scoring purposes.

What reputation data does the DAAR system use?

We believe that it is beneficial to collect the same security threat (abuse) data that is reported to industry and Internet users. Security systems such as anti-spam or anti-malware gateways or firewalls that protect billions of users incorporate these data into their threat mitigation measures. DAAR thus reflects how the users and network operations communities see the domain name ecosystem through the lens of threat data.

DAAR incorporates a large number of reputation feeds; see the list of feeds and providers at the end of this Q&A for the feeds in use as of the date of this writing. Collectively, these feeds give multiple sources for the security threats that DAAR can measure or analyze. DAAR is designed to be extensible – to ensure quality data, and to assess security threats that the ICANN community may identify in the future. Therefore, data feeds from reputation providers may be added or removed over time.

How reliable is DAAR's data?

For now, DAAR uses two categories of data: zone data and reputation data.

DAAR collects TLD zone data daily, using ICANN's Centralized Zone Data Service and/or provided directly through agreements with TLD operators. Zone data, the availability of which is contractually mandated for generic top-level domains (gTLDs) and volunteered by country code top-level domains (ccTLDs), are generally provided once a day. As such, DAAR will not observe zone changes after the daily release of zone data until the following day.

DAAR collects reputation data from providers that were selected based on the reputation for accuracy (defined here as having near-consensus adoption across the operational security community). The providers must have clearly defined processes for adding and removing identified domain names from their feeds. Another selection criteria is the prevalence of use of the feed by academia in research papers and theses, as well as by industry in products and services. Finally, the feed must support at least one of the four security threat classifications that DAAR tracks. We are developing a more robust feed evaluation (selection/removal) methodology which will be published in the near future.

How up-to-date is the data?

The data used by DAAR is updated each day. The domain counts are collected each day from fresh TLD zone files. Some registry operators only grant zone file access for limited periods before requiring renewal, and this creates occasional gaps. The reputation providers continually add domains to their lists. The system collects these updated data from each provider several times per day. Each provider also has a procedure for removing domains from their lists, and these removals are tracked and accounted for within DAAR. Generally, each provider lists a domain name for as long as the provider believes the domain constitutes a problem, after which the domain is removed. A domain name may be listed only for minutes, or for months, depending on the provider's policies and criteria.

There are a few lists that do not track abuse status and therefore do not provide "removal" flags. One example is the Anti-Phishing Working Group (APWG) phishing feed. This is a list of newly confirmed phishing identifiers, but the APWG does not then track to see which sites are up and which are down. To be conservative, when a domain is listed in the APWG feed, we only count that domain as "listed" or "active" for one day.

Does DAAR find all instances of the security threats in the DNS?

No. The DAAR system collects security threat data from multiple reputation service providers. However, these providers do not, and do not claim to see or list all threat activity happening on the Internet. We therefore note that DAAR provides a baseline measurement, and that the amount of security threats associated with domain names is larger than what this system catalogues. Users of DAAR data should assume that the statistics it presents are a subset of the security threat problem in a given TLD.

Does DAAR list only domains that were registered by malicious parties?

No. In general, most reputation providers cannot definitively attribute motives to actors registering domain names. Likewise, DAAR is not likely to know the motives for registering domain names. DAAR relies on reputation service providers that use modern security threat detection. Some provider's feeds may also contain domain names for which the hosting service has been compromised, resulting in the domain being used for malicious purposes. We are working on developing methodologies to be able to distinguish between the two sets of domains.

How does DAAR deal with false positives in reputation data?

Based on independent review, false positive rates are low among the lists we have chosen. DAAR does not modify the data received from reputation feeds, so if the feeds include false positives, those false positives are reflected in DAAR's output. However, since numerous parties rely on these reputation feeds – for example, email service providers, Internet service providers, and resolver operators – any false positives will affect the domain name ecosystem regardless of how DAAR reports them. As such, efforts within the DAAR system to further reduce false positives would result in conflicting or false information from the perspective of impact of reported security threats to those parties. ICANN's SSR Team is continuously monitoring the quality of the data. Therefore, feeds can be added or removed based on quality assessment performed by members of the SSR Team or based on the community's feedback.

Who will have access to DAAR?

Only ICANN staff and contracted developers can access DAAR directly through its administrative interface. Registries can now have access to their own data via the ICANN SLAM system. For more info regarding this please contact Gustavo.Lozano@icann.org

OCTO's SSR team will be working with the ICANN community to determine the best way to share the statistics and analyses derived from data that DAAR collects.

What is the relationship between DAAR and the Open Data Initiative?

The Open Data Initiative is an umbrella term for efforts aimed at making it easy for anyone to access data that the ICANN organization or community creates or curates. DAAR uses data from public, open, and/or commercial sources. DNS zone data and WHOIS registration data are publicly available. Certain reputation data sources are open source, whereas others are commercial feeds requiring a license or subscription. For some commercial feeds, licensing permits derivative but not direct use. In cases where there are no limitations on redistribution of DAAR-related data, these data and analyses will be published periodically and included in the Open Data Initiative.

What role will DAAR play in ICANN policy?

The purpose of DAAR is to provide verifiable and reproducible data to facilitate analyses that could be useful in making informed consensus policy decisions. DAAR assembles a composite of the domain name reputation data that the operational security community observes, reports, and uses. It is up to the ICANN community to determine whether or how to use the reports derived from DAAR-collected data in policy deliberations.

How does DAAR fall into ICANN's remit?

For ICANN to help ensure the security and stability of the top-level of the Internet's system of unique identifiers that it coordinates according to its mission, both the ICANN organization and community must be aware of threats to that system. In keeping with ICANN's requirements for openness and transparency, the organization must, as much as is possible, make the data we collect available to the community. Finally, the role of the ICANN organization in general, and the Office of the CTO in particular, is to provide neutral, unbiased data and analyses to facilitate policy discussions and development.

How can I provide input on DAAR?

List of Reputation Data Providers and data feeds

As of July 2017, DAAR incorporates the following blocklists.

  • Spamhaus Domain Blocklist (DBL). Domains advertised in spam, domains used for phishing, and domains used to support malware.
  • SURBL. Domains advertised in spam, domains used for phishing, and domains used to support malware.
  • Anti-PhishingWorking Group. URL blocklist feed: domains used for phishing.
  • Phishtank. Domains used for phishing.
  • Malware Patrol: Domains used to support malware. In addition, Malware Patrol's feed incorporates listings from these malicious domain blocklists:

    • SpamAssassin
    • Carbon Black Malicious Domains
    • Squid Web Proxy
    • Smoothwall
    • Symantec Email Security for SMTP
    • Symantec Web Security
    • Firekeeper
    • DansGuardian
    • Ransomware URLs
    • Botnet C&C server IPs
  • Ransomware Tracker. Malware botnet C&C servers.
  • Feodotracker. Domains used to support malware.
Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."