Open Data Initiative Datasets and Metadata
11 June 2018 23:59 UTC
27 July 2018 23:59 UTC
Staff Report Due
3 September 2018 23:59 UTC
To seek community views on the list of datasets that ICANN has available to publish and the metadata that ICANN intends to publish along with each dataset. The comments will be used to determine the priorities for publication.
The Open Data Initiative is almost ready to start publishing datasets. A Data Asset Inventory has been created, which provides an initial list of the datasets that are potential candidates for publication, and associated metadata standards have been defined. The RFP to choose an open data platform is almost complete and an announcement will be made by ICANN62 in Panama City.
The feedback will be used to help determine the priorities for publishing datasets on the upcoming ICANN open data platform and amending any elements of the publication plan to address community feedback.
Section I: Description and Explanation
This public comment is intended to help guide the next stage of the Open Data Initiative and to provide a detailed insight for potential data consumers and other interested community members.
Over the last few months a Data Asset Inventory has been created within ICANN org. This lists the preliminary set of datasets that ICANN holds along with associated attributes, such as the system of record and data format. It is the intention that every dataset on this list that can be published as open data, will be published over time. That will be a lengthy and complex process and could take some time to complete. Consequently, we are seeking feedback that will help us prioritize the publication of datasets. The Data Asset Inventory is available in both CSV and PDF formats as detailed in section III.
In addition, we have defined a metadata vocabulary for the metadata that will be published alongside the data and we are seeking feedback on this metadata vocabulary. This vocabulary is detailed in section IV below and is available in both CSV and PDF formats as details in section III.
The specific questions we seek feedback on are as follows:
What are your priorities for publication of datasets identified in the data asset inventory?
The next major stage in the Open Data Initiative is the lengthy process of publishing datasets on the upcoming open data platform. This stage could take some time, so it is important that community priorities are taken into account and the highest priority data are released first.
The datasets within ICANN are stored in a variety of formats in a variety of systems of record. In many cases, custom code will need to be developed by ICANN staff to publish data, with any required redaction or aggregation applied. A process which can vary widely in both time and cost. Accordingly, we do not intend to translate community priorities directly into a prioritized list for publication and will instead use a prioritization model that combines ease of publication and community priority.
Are there any errors or omissions in the data asset inventory?
Creating an inventory of datasets is complex process as this is a cutting-edge subject and staff of ICANN org, as with many other organizations, are still learning what makes a dataset. There is therefore the distinct possibility that there may be errors or omissions in the inventory and for that reason we seek feedback on the inventory as provided.
Even if you do not know whether or not ICANN holds a specific dataset but you would like see that dataset published by ICANN then please let us know.
Does the proposed metadata vocabulary meet your needs?
The metadata vocabulary is based on the Project Open Data Metadata Schema v1.1 with minor amendments. We have chosen this standard over other standards such as DCAT, due to its simplicity, greater applicability and ease of processing. This choice does not preclude us later adding additional metadata schemes to our published open data.
Section II: Background
The term "Open Data" has a very precise meaning. Data are open if anyone is free to use, re-use or redistribute them, subject at most to measures that preserve provenance and openness. There are three dimensions of data openness:
- Data must be technically open, which means they must be published in electronic formats that are machine readable and non-proprietary, so that anyone can access and use the data using common, freely available software tools.
- Data must be practically open, which means they must be publicly available and accessible on a public server, without password or firewall restrictions.
Open data in this context means specifically tabular data – that is, data that would normally be stored in a spreadsheet, data file, or database. Data might also be stored in a more structured data format such as JSON or XML. To be clear, what will not go into the open data platform is information in a broader sense – e.g., policy documents, application forms, or email messages. Instead, these documents are part of the Information Transparency Initiative.
The Open Data Initiative is the program within ICANN to identify and publish all datasets as open data that do not have restrictions that require them to be confidential.
The Open Data Initiative is a personal goal of the President and CEO, who has written a blog post setting out his reasons for promoting open data, his expectations for how ICANN will deliver this, and his vision of how open data will benefit ICANN org and the ICANN community.
There are a number of different initiatives underway related to the publication of data, which need disambiguating:
- The Open Data Initiative, which this public comment relates to and which is explained above.
- The Information Transparency Initiative, which relates to simplifying access to documents and other non-tabular information. This initiative is independent of and complementary to the Open Data Initiative.
- The Identifier Technology Health Identifier (ITHI), which is a project gathering specific data relating to the health of identifier technologies. The output of this project will be published part of the Open Data Initiative.
- Domain Name Marketplace Indicators, which is a project gathering specific data relating to the domain name marketplace. The output of this project will be published as part of the Open Data Initiative.
As noted above, the open data will be published on an upcoming Open Data Platform. ICANN has been undertaking an RFP for several months seeking a Software-as-a-Service (SaaS) open data product from an established vendor. This RFP used a detailed list of requirements and a thorough process of assessment including presentations and demonstrations with multiple ICANN staff and contractors involved. This process is now in the final stage of deliberations and negotiations with a chosen vendor.
Section III: Relevant Resources
The public extract of the Data Asset Inventory is available in CSV format at https://www.icann.org/en/system/files/files/odi-data-asset-inventory-spreadsheet-11jun18-en.csv and in PDF format at https://www.icann.org/en/system/files/files/odi-data-asset-inventory-11jun18-en.pdf.
The metadata vocabulary as referred to in the Metadata Standard is available in CSV format at https://www.icann.org/en/system/files/files/odi-metadata-vocabulary-spreadsheet-11jun18-en.csv and in PDF format at https://www.icann.org/en/system/files/files/odi-metadata-vocabulary-11jun18-en.pdf.
Section IV: Additional Information
The ICANN metadata standard follows the Project Open Data Metadata Schema V1.1 and consists of a metadata vocabulary with the following customisations:
- without the USG tagged fields
- Without some fields that are not needed
- rights - all our data is public
- temporal - to avoid confusion between this update and the main dataset
- distribution - refers to URLs the open data platform should generate
- issued - too complex to determine when a dataset was first published
- With values specified for some fields - publisher, accessLevel, license
- With custom restrictions on some fields
- With different requirements for which fields must be present
A link to the metadata vocabulary is given in section III.
Report of Public Comments