The FAIR (Findable, Accessible, Interoperable, Reusable) principles have gained more and more significance over the last few years. After a PWC analysis reported on the European economy losing 10.2 billion € every year for not having FAIR research data plus an additional amount of 16 billion € annually for dealing with the consequences (Source), the German scientific funding body DFG has included FAIR in their recently updated codex (Source) on good scientific practice and made living up to it legally binding. However, only 15% (Source) of researchers are familiar with FAIR. Therefore, we would like to introduce the FAIR principles to you, explain where they come from and what they mean.
“the German scientific funding body (DFG) has implemented the guidelines into their recently updated codex on good scientific practice”
The FAIR principles are helpful tools to overcome data discovery and reuse obstacles
In 2014, a number of academic and private stakeholders held a workshop in Leiden, Netherlands to overcome obstacles preventing data discovery and reuse. As a result they agreed on supporting guidelines that were meant for those wishing to enhance the reusability of their scientific data and simultaneously enhance the ability of machines to automatically find and use data. These guidelines referred to as FAIR principles were first published in Scientific Data by Mons et al. in 2016.
“the European economy loses 10.2 billion € every year plus another 16 billion € annually for the consequences for not having FAIR research data”
FAIR makes data available for researchers and machines
The guidelines’ goals make up an acronym: FAIR stands for Findable, Accessible, Interoperable and Reusable. They refer to data, metadata (information about the data) and their infrastructure. In more detail, this means that the central idea of applying the FAIR principles in data management strategies is to ensure the findability and accessibility of already existing scientific data as well as making it reusable for other researchers and machines by including contextual data (metadata), e.g. experimental parameters necessary to achieve the same results. They are meant to support the data’s accessibility for IT equipment to either parse data or make it available to researchers, since humans increasingly rely on computational support to handle, find and access data. With the utilization of the FAIR principles, interlinking existing data and making data accessible for machines new scientific insights may present themselves.
Out of the four principles, findability is probably the most significant one to be realized, since all other principles rely on the data and metadata to be findable. In a first step to ensure data’s and metadata’s findability, ambiguity has to be removed by assigning globally unique and persistent identifiers (F1). To facilitate automatic computer searches and filtering for data (groups), rich metadata (F2) including descriptive information about the context, quality and condition, or characteristics of the data needs to be included. In order to link data with metadata and enhance their findability, metadata needs to include clearly and explicitly the identifier of the data they describe (F3). The discovery of data and its metadata is facilitated by registering or indexing them in searchable resources (F4) resulting in (meta)data’s findability without researchers knowing of their existence.
Data’s and metadata’s accessibility is ensured by using standardised communication protocols for their identifiers (A1). To achieve this, the protocol is supposed to be open-sourced, free of charge, and universally implementable (A1.1) resulting in at least full metadata retrievability for everybody. By utilizing such protocols, they allow authentication and authorisation procedures (A1.2) ensuring that required metadata is accessible, even when the data is no longer available (A2).
Interoperability typically refers to the ability of computer systems or software to exchange and make use of information or at least have knowledge of the other system’s data exchange formats. This means that data should be readable for IT equipment without the need for specialised or ad hoc algorithms, transNunitors, or mappings. Moreover, data needs to be integrated with other data and be able to cooperate with workflows for analysis, storage and processing. In order to achieve interoperability, data and its metadata have to use a formal, accessible, shared, and broadly applicable language for knowledge representation (I1). A key instrument for this is to use commonly used vocabularies that follow the FAIR principles (I2) by being well documented and resolvable using globally unique and persistent identifiers. Additionally, coherences between datasets have to be documented e.g. if one dataset builds on another dataset, if additional datasets are needed to complete the data, or if complementary information is stored in a different dataset. This is achieved if data and metadata include qualified references to other (meta)data (I3).
Reusability of scientific data is successfully achieved if data can be replicated or combined in different settings. The reuse of data represents the optimum goal of the FAIR principles. In order for users – regardless if impersonated by machines or humans – to decide if the data is useful in a specific context, it has to be richly described with a plurality of accurate and relevant attributes (R1). This may include the context under which the data was generated including experimental protocols, the manufacturer and brand of the machine or sensor that created the data and even additional information that might seem irrelevant for the publisher, but may be relevant for others in a completely different context. In order to ensure data’s and metadata’s reusability the conditions under which the data can be used for machines and humans have to be clear. Therefore, metadata and data have to be released with a clear and accessible data usage license (R1.1). Moreover, – and this is almost common sense – the detailed origin of data and its metadata have to be provided (R1.2) referring to e.g. authors, predecessor projects. Finally, templates should be used when available so that data and metadata meet domain-relevant community standards (R1.3).
FAIR starts internally
Currently, the FAIR principles are usually mentioned in the context of publishing data or providing Open Access data. But the utilization of FAIR concepts doesn’t start with the process of publishing data, they starts internally, in the day to day lab routine, as soon as the data is generated. The data has to be already findable, accessible, interoperable and reusable within the own research group/team, so that when the day of publication arrives, the data is already in the FAIR state, and ready for submission.