This post is the first in an ongoing series of updates on the status of the root KSK roll project. We intend to keep the community updated on our efforts to proceed with the rollover.
On 27 September 2017, the ICANN org announced we were postponing the root zone KSK roll. More recently, on 17 October we published a paper entitled Postponing the Root KSK Roll [PDF, 173 KB] that gives further details on the information we received on the configuration of some resolvers that influenced the decision, our analysis, and our reasoning behind the postponement.
As we described in that paper [PDF, 173 KB], the most recent versions of the BIND and Unbound recursive resolvers implement a protocol defined in RFC 8145 [TXT, 27 KB], Signaling Trust Anchor Knowledge in DNS Security Extensions (DNSSEC), that allows a resolver to report its trust anchor configuration. A resolver supporting this feature reports its configured trust anchors for the root zone to the root name servers. Analysis of this reported trust anchor data, in the weeks leading up to the previously scheduled rollover date of 11 October 2017, led to concerns that there could be a larger than anticipated population of resolvers not configured with KSK-2017 (our shorthand for the next root zone KSK) as a trust anchor. Those resolvers will be unable to resolve DNS queries when the rollover occurs.
The Research group in the Office of the CTO (OCTO) analyzed traffic to the B, D, F and L root servers for the entire month of September 2017 and found 11,982 unique IP addresses (8,908 IPv4 and 3,078 IPv6) sending trust anchor configuration information. Of those, 620 addresses reported being configured with only KSK-2010 (the shorthand for the current root zone KSK). Upon further analysis, we were able to eliminate some false positives: IP addresses that, for various reasons, did not represent recursive resolvers performing DNSSEC validation. We reduced the list to 500 addresses of possible misconfigured recursive resolvers whose operators we would like to contact. We have two main reasons for wanting to reach them: to understand the reason their resolver reports being configured with only KSK-2010 and, if appropriate, to help them correct the configuration to be prepared for the rollover.
We had initially planned to make this list of addresses public to enlist the community's help. Upon further reflection, we realized that such a list could be taken out of context as an attempt to "name and shame" operators with misconfigured systems, which is not our intent at all. We've decided to make an initial attempt to contact the administrators ourselves. Depending on the outcome, we might need to publish the list of addresses whose administrators we are unable to reach.
According to the data from September mentioned above, 4.1% of IP addresses report only KSK-2010. (The actual percentage of all resolvers on the Internet with only KSK-2010 might likely be higher, since only a very small number currently report trust anchor configuration.) We want to make a significant improvement in that number through our investigation and mitigation. Since we don't know how many administrators we'll be able to contact, we don't want to set a target percentage just yet.
It's important to note that the value represents a percentage of resolvers, not end users, and the impact on end users is what's most important. The criteria in the published operational plan [PDF, 741 KB] for backing out of the rollover in the event of problems references the effect on end users:
ICANN will consider back out of any step in the key roll process if the measurement program indicates a considerable amount of the estimated Internet end-user population has been negatively impacted by the change 72 hours after each change has been deployed into the root zone.
These criteria were derived in part from the recommendations of the root KSK roll design team that ICANN convened to help plan the rollover, whose report [PDF, 1.2 MB] includes this recommendation that also centers on end users:
Recommendation 16: Rollback of any step in the key roll process should be initiated if the measurement program indicated that a minimum of 0.5% of the estimated Internet end-user population has been negatively impacted by the change 72 hours after each change has been deployed into the root zone.
Throughout this process, we will consider the end-user impact a more important consideration than the absolute percentage of resolvers that still report only KSK-2010. After we've contacted as many resolver operators as we can, we'll attempt to determine the number of end users affected by the remaining resolvers not yet reporting KSK-2017. Determining the number of end users that use a particular resolver is difficult, but we have several ideas and sources of data to help with this task.
We'll report more on these efforts and other developments in future blog posts as we keep the community updated on our progress.