Reporting Potential Pandemic-Related Domains
Any significant event attracts a number of interested parties. Both offline and online, some parties will have good intentions and, unfortunately, others will not. During newsworthy events in particular, domain registrations containing terms related to those events inevitably increase.
The global COVID-19 pandemic is no different. We see people registering sites to raise money or provide help to those affected either directly or indirectly. We also see people creating domains that use the pandemic as a hook to gain traffic and draw victims in, subjecting end users to phishing, malware, and scams.
In response to the latter, a number of groups are creating and making available "threat intelligence," that is, information and data on reported or observed security threats related to these domains. Some groups, like the COVID-19 Cyber Threat Intelligence League and the COVID-19 Cyber Threat Coalition, are taking action to counter these malicious actors.
ICANN organization (org) is contributing to this COVID-19 anti-abuse effort, using our knowledge and expertise to put actionable intelligence into the hands of those able to disrupt malicious campaigns. We are filtering lists created from zone files and enriching them with data from external sources to identify domain names that may be malicious from the majority that are not. However, we must be careful not to add more noise to those already swamped with information. In other words, while we aim to provide data that allows a rapid assessment of a domain's status, we also need a high level of confidence in that assessment. Without both of these, we may end up doing more harm than good.
We are now producing reports on recent domain registrations that we believe to be using the COVID-19 pandemic for phishing or malware campaigns. These reports, which are shared with the responsible parties (primarily registrars or registries), contain the evidence that leads ICANN org to believe the domains are being used maliciously, along with other background information to help the responsible parties determine the correct course of action.
To generate the reports, we examine available generic top-level domain (gTLD) zone files. These files allow us to see newly delegated registrations (although our report generation process can actually receive input from any source of domains or hostnames). Specifically, we look for new zone file entries that contain words like "COVID," "corona," "pandemic," and other related terms. The resulting list of domain names is not yet actionable, as it will include both benign and malicious domains. We need to further refine the list before we would have high confidence in the contents.
The next step is to look across a number of threat intelligence sources for indications of the domain being used for phishing or malware distribution. We start by using Virus Total, AlienVault OTX, Phishtank, and Google Safe Browsing; however, our report generation process is designed to be extensible so threat intelligence sources can be added or removed. The data provided by these sources can suggest a domain is malicious, although in most cases we find little to no evidence. Lack of evidence can be ascribed to:
- Many of the domains we see are "parked," e.g., registered with the intent to sell at a profit or generate revenue via advertising.
- Others are young domains that have not yet been used maliciously or the behavior has not yet been observed.
Therefore, it is possible that over time the reputation of a domain will change. In order to allow for this, we periodically retest domains.
It is worth noting that the sources we use focus on malware and phishing, so the malicious activity that we see is largely within these two categories. We do also pick up domains involved in spam or other undesirable behaviors. In order for us to consider those domains, they must also be listed in the malware or phishing sources. It is not uncommon for domains to exist in multiple categories, because different collection methods employed by threat intelligence providers detect different aspects of a malicious campaign. Also, categorization is not an exact science and can be open to interpretation. Therefore, when we use the term "malicious" here we are referring to phishing or malware; however, that does not preclude a domain from appearing on a spam or other undesirable behavior list.
When credible evidence of malicious activity is found, we proceed to gather more information about the domains in line with the reporting requirements specified by registrars in the Guide to Registrar Abuse Reporting. This reporting information includes the registrar (and abuse contact in particular), hosting information, etc. This information is gathered to help those receiving our reports make the decision on whether or not to take action (such as suspension) against the domain. The report generation process as described can be summarized in the following flowchart:
The output of this process is gathered into two different formats:
- Comma Separated Values (CSV) summary which contains the name, resolved IP, and name server records plus the basic score from each threat intelligence source and a total.
- Markdown file per domain that contains more detail about that domain, including links to external evidence when available and appropriate.
Since we began this effort at the end of March, we have seen an average of approximately 3,250 new domain names per day that match our search terms. This has resulted, at the time of writing, in more than 82,000 names. Of these, we have found evidence to link approximately 7,000 names to malicious behavior and have been able to resolve the domains (e.g., the domain names had not already been suspended). For any given day, the number of domains that have threat intelligence associated with them increases simply because there has been more opportunity for the behavior associated with the domains to be observed. As such, the proportion of older domain names with reports will be higher than younger domains. On the flip side, the malicious domains are starting to be noticed and suspended.
You may notice that the numbers shared about are lower than other public reports. There are two primary reasons for this:
- We start with sources that contain domain names found in zone files only. We are not feeding in lists of full hosts seen in passive DNS sources or some other equivalent starting point.
- We only list domains that have sufficient evidence of malicious activity and still resolve.
Our approach rules out domains that have been bought by speculators and parked. While blocking access to these domains may not be disruptive to an end user, we could not confidently claim that these domains are malicious.
It is a sad fact of life that there will always be malicious actors willing to turn any situation, even a pandemic, to their advantage. Our response to the misuse of domain names has been to put the evidence of which domains we believe to be malicious into the hands of those best placed to take appropriate action. We also believe that the bar to providing this evidence must be high enough so we do not contribute more noise than signal. The report generation process we've implemented goes to significant effort to avoid false positives. We look forward to working with our community and others to improve our processes.