Skip to main content

Reporting Potential Pandemic-Related Domains

Introduction

Any significant event attracts a number of interested parties. Both offline and online, some parties will have good intentions and, unfortunately, others will not. During newsworthy events in particular, domain registrations containing terms related to those events inevitably increase.

The global COVID-19 pandemic is no different. We see people registering sites to raise money or provide help to those affected either directly or indirectly. We also see people creating domains that use the pandemic as a hook to gain traffic and draw victims in, subjecting end users to phishing, malware, and scams.

In response to the latter, a number of groups are creating and making available "threat intelligence," that is, information and data on reported or observed security threats related to these domains. Some groups, like the COVID-19 Cyber Threat Intelligence League and the COVID-19 Cyber Threat Coalition, are taking action to counter these malicious actors.

ICANN organization (org) is contributing to this COVID-19 anti-abuse effort, using our knowledge and expertise to put actionable intelligence into the hands of those able to disrupt malicious campaigns. We are filtering lists created from zone files and enriching them with data from external sources to identify domain names that may be malicious from the majority that are not. However, we must be careful not to add more noise to those already swamped with information. In other words, while we aim to provide data that allows a rapid assessment of a domain's status, we also need a high level of confidence in that assessment. Without both of these, we may end up doing more harm than good.

We are now producing reports on recent domain registrations that we believe to be using the COVID-19 pandemic for phishing or malware campaigns. These reports, which are shared with the responsible parties (primarily registrars or registries), contain the evidence that leads ICANN org to believe the domains are being used maliciously, along with other background information to help the responsible parties determine the correct course of action.

Process Detail

To generate the reports, we examine available generic top-level domain (gTLD) zone files. These files allow us to see newly delegated registrations (although our report generation process can actually receive input from any source of domains or hostnames). Specifically, we look for new zone file entries that contain words like "COVID," "corona," "pandemic," and other related terms. The resulting list of domain names is not yet actionable, as it will include both benign and malicious domains. We need to further refine the list before we would have high confidence in the contents.

The next step is to look across a number of threat intelligence sources for indications of the domain being used for phishing or malware distribution. We start by using Virus Total, AlienVault OTX, Phishtank, and Google Safe Browsing; however, our report generation process is designed to be extensible so threat intelligence sources can be added or removed. The data provided by these sources can suggest a domain is malicious, although in most cases we find little to no evidence. Lack of evidence can be ascribed to:

  • Many of the domains we see are "parked," e.g., registered with the intent to sell at a profit or generate revenue via advertising.
  • Others are young domains that have not yet been used maliciously or the behavior has not yet been observed.

Therefore, it is possible that over time the reputation of a domain will change. In order to allow for this, we periodically retest domains.

It is worth noting that the sources we use focus on malware and phishing, so the malicious activity that we see is largely within these two categories. We do also pick up domains involved in spam or other undesirable behaviors. In order for us to consider those domains, they must also be listed in the malware or phishing sources. It is not uncommon for domains to exist in multiple categories, because different collection methods employed by threat intelligence providers detect different aspects of a malicious campaign. Also, categorization is not an exact science and can be open to interpretation. Therefore, when we use the term "malicious" here we are referring to phishing or malware; however, that does not preclude a domain from appearing on a spam or other undesirable behavior list.

When credible evidence of malicious activity is found, we proceed to gather more information about the domains in line with the reporting requirements specified by registrars in the Guide to Registrar Abuse Reporting. This reporting information includes the registrar (and abuse contact in particular), hosting information, etc. This information is gathered to help those receiving our reports make the decision on whether or not to take action (such as suspension) against the domain. The report generation process as described can be summarized in the following flowchart:

Reporting Potential Pandemic-Related Domains Flowchart

The output of this process is gathered into two different formats:

  • Comma Separated Values (CSV) summary which contains the name, resolved IP, and name server records plus the basic score from each threat intelligence source and a total.
  • Markdown file per domain that contains more detail about that domain, including links to external evidence when available and appropriate.

Statistics

Since we began this effort at the end of March, we have seen an average of approximately 3,250 new domain names per day that match our search terms. This has resulted, at the time of writing, in more than 82,000 names. Of these, we have found evidence to link approximately 7,000 names to malicious behavior and have been able to resolve the domains (e.g., the domain names had not already been suspended). For any given day, the number of domains that have threat intelligence associated with them increases simply because there has been more opportunity for the behavior associated with the domains to be observed. As such, the proportion of older domain names with reports will be higher than younger domains. On the flip side, the malicious domains are starting to be noticed and suspended.

You may notice that the numbers shared about are lower than other public reports. There are two primary reasons for this:

  • We start with sources that contain domain names found in zone files only. We are not feeding in lists of full hosts seen in passive DNS sources or some other equivalent starting point.
  • We only list domains that have sufficient evidence of malicious activity and still resolve.

Our approach rules out domains that have been bought by speculators and parked. While blocking access to these domains may not be disruptive to an end user, we could not confidently claim that these domains are malicious.

Summary

It is a sad fact of life that there will always be malicious actors willing to turn any situation, even a pandemic, to their advantage. Our response to the misuse of domain names has been to put the evidence of which domains we believe to be malicious into the hands of those best placed to take appropriate action. We also believe that the bar to providing this evidence must be high enough so we do not contribute more noise than signal. The report generation process we've implemented goes to significant effort to avoid false positives. We look forward to working with our community and others to improve our processes.

Comments

    Domain Name System
    Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."