Skip to main content

Part I: What Are Metadata?

Metadata 1000x700 11may16 en

The concept of metadata is both simple and complicated. We readily understand what data are: they are the information that we communicate, process or consume in our ever-growing digitized society. But what are metadata?

Metadata: Data About Data

Data, especially digital data, take many forms. Voice conversations, text messaging or social media communicate data. Digital banking or merchant transactions involve the transfer of data. Web content, digitized and streamed entertainment, databases or information repositories of all kinds are examples of publications of data.

Metadata describe what these data are: they provide information about these data. That's pretty simple. However, if we dig a bit deeper we find that "describing" data is both a rigorous technical exercise and socio-politically charged issue. In this Part I, I'll explain what metadata are in a technical, quasi-scholarly manner.

What Kinds of Data Are Metadata?

Metadata provide a means to classify, organize, and characterize data or content. The National Information Standards Organization (NISO) provides a taxonomy that can be applied to all kinds of data or to data repositories, from libraries to web sites, for textual and non-textual data, in digitized or material forms.

NISO describes three types of metadata.

Descriptive metadata include information such as points of contact, the title or author of a publication, an abstract of a work, keywords used in a work, a geographic location, or even an explanation of methodology. These data are useful to discover, collect or group resources according to characteristics the resources share. To appreciate how descriptive metadata relate to informational data, visit the Business and Consumer Surveys pages hosted by the European Commission of Economic and Financial Affairs. In addition to the survey data, you can also obtain the BCS Metadata for each EU member country's survey, for example, France. The metadata files identify contact data, methodology, and date for each survey, but they do not contain the question or response data collected during the conduct of the survey itself.

Structural metadata explain how a resource is composed or organized. A digitized book, for example, can be published as individual page images, PDF or HTML files. These pages or component parts might typically be grouped into chapters. The chapter data, table of contents, or page layout details are considered structural metadata. A structural map of the pages or other resources of a web site, security intrusion event record types or voice call detail records are also kinds of structural metadata.

Administrative metadata are used to manage a resource. Creation or acquisition dates, access permissions, rights or provenance, or guidelines for disposition such as retention or removal are examples of rights that a digital archivist, curator, might employ. Similar metadata would be relevant for a database administrator, or for administrators responsible for capturing telecommunications or data network traffic flows or security log and event data.

We've Only Scratched The Surface

Now that you've seen several kinds of metadata, you can appreciate how useful metadata can be for any party, organization, or government agency that collects, aggregates, manages, or retains metadata on a large scale. You may also appreciate how activities that involve metadata collection on a large scale can be sources of controversy. We'll cover these in the next post in this series.

Comments

    Florian Darion  07:06 UTC on 11 May 2016

    Hello, this is very interresting, but i'm french and i'm not sure to understand everything. Do you have a good french source who talks about metadata ? Thank you.

    Patrick  05:09 UTC on 27 May 2016

    Salut Florian, n'hésite pas à utiliser Google Translate, c'est pas parfait mais c'est mieux que rien ;)

    Regis Stephant  07:31 UTC on 06 June 2016

    Bonjour Patrick, quand je vois les dégâts provoqué par Google Translate, je me dis que je vais peut être commenter en français lol Plus sérieusement, et en français du coup, et bien je trouver ce sujet des méta données passionnant, avec tous ce que ça recouvre également en matière de vie privée.

Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."