Throughout this guide where information relates to the FAIR Data Principles you will see one of these icons.
The FAIR Data Principles are a set of guidelines for best practice for managing the outputs of research, with the ultimate goal of optimising the reuse of research data. But why do you need to think about the re-use of your data? There are many good reasons to plan for the long term care of your research data even at the earliest stages of your research project.
Perhaps one of the more compelling arguments is to ensure the transparency of your research. Many funding organisations and journal publications now ask researchers to ensure that the data underpinning their published findings are available to their peers, to ensure the integrity and validity of the research.
There is also the argument for the re-use of the data to ensure maximum value from that data. Researchers may not consider the life that the research data they gather or generate can have beyond their own project. Unfortunately, this can lead to data waste, where data is stored indefinitely without gaining any further value from it, and data loss where precious data becomes inaccessible even to the creator of that data. Both funders and researchers are aware of how much time, effort and expense goes into gathering good quality data, so anything that can be done to ensure the data remain usable into the future is time well spent.
To summarise, research data should be retained in a usable condition long after the study has completed for reasons including:
- The transparency and reliability of the research, whereby published results can be verified.
- Ensuring maximum value is extracted from the data.
- Enabling new research both by yourself and by others.
- Improving access to scientific knowledge and equity in access to high quality data.
- Enabling your data to be linked to other data to allow for new and cross-disciplinary avenues of research.
How do the FAIR Data Principles help? Contemporary research is conducted in a data-rich environment where connectivity between datasets is made possible through a global digital ecosystem. There is great potential for data from anywhere in the world to be shared and re-purposed for new and innovative science. However, even though the Internet is a ‘data rich environment’, both humans and machines often face distinct barriers when attempting to find and process data from this digital ecosystem. Many of the problems stem from the way in which data have been organised and stored.
Addressing this issue in 2016, the authors of the FAIR Data Principles (Wilkinson et al.) wrote:
What constitutes ‘good data management’ is largely undefined, and is generally left as a decision for the data or repository owner. Therefore, bringing some clarity around the goals and desiderata of good data management and stewardship, and defining simple guideposts to inform those who publish and/or preserve scholarly data, would be of great utility.
The FAIR Data Principles were established to overcome data discovery and reuse obstacles by developing a minimal set of community-agreed principles and practices to guide researchers on how to prepare and share their data and research outputs so that they are genuinely available and usable.
Why is FAIR Data important to researchers? The FAIR Data Principles have rapidly come to define best practice in the management of research data. The principles have been readily adopted by publishers, funding organisations, academic and scientific institutions and research communities as a benchmark for research data. It is becoming standard practice for funders of research and for journal publications to require researchers to manage and share their research data in accordance with the FAIR Data Principles. Below you will find some practical steps you can take to make your research data FAIR.
The original paper describing the FAIR Data Principles is available here: Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
FAIR is an acronym and stands for data that are:
FAIR data does not equate to Open Data. Open data is data that is free to use, reuse, or redistribute.The FAIR Data Principles contain the clause ‘as open as possible but as closed as necessary’ in recognition that some valuable research data is very sensitive and cannot be shared as Open Data.
Sometimes there are very legitimate reasons why data cannot be shared as Open Data, for example when data from a patient sample is potentially identifying (personal data), or where participant consent has been agreed for the data to be used in very specific ways. In these situations it is possible to control or manage access to the data via a data repository.
The FAIR Data Principles continue to bring value to sensitive data because:
This document from CONUL Consortium of National and University Libraries (Ireland) outlines some things to consider if you are working with data that is subject to data protection legislation (GDPR) and have plans to make the data available as FAIR data: The intersection of the GDPR and sharing research data as FAIR and Open Data.
FAIR data begins with a good research data management plan.
Research data management (RDM) is the active and ongoing management of data “from its entry to the research cycle through to the dissemination and archiving of valuable results” (Whyte & Tedds, 2011).
Visit our Library guide on research data management for practical advice on how to manage and produce FAIR data.
There are four pillars to FAIR data. In this section each is described using the original description of the FAIR Data Principles published by FORCE11. For each pillar, the authors have provided a set of attributes of data that is Findable (F1-3), Accessible (A1-2), Interoperable (I1-3) and Reusable (R1-1.3). For each pillar we provide a list of practical things that you can do to make your data FAIR.
Practical things you can do to make ensure your data are Findable:
► Create rich metadata that describes your data in the fullest detail possible. Advice on creating metadata is described in the RCSI Research Data Management Guide on metadata.
► Share this metadata in a searchable online resource, such as a data repository or catalogue, so others can read about your data, even if they have to contact you to request access to the data. Advice on sharing sensitive data is provided in the RCSI Research Data Management Guide data sharing.
► Attach a persistent identifier such as a DOI number to your data, and use this identifier to direct others to your data.
Sometimes the data repository will provide you with guidance on, or a template, to create the descriptive metadata. This metadata will then be added to their searchable catalogue, so it makes sense to provide as much information as possible. The data repository may also provide the persistent identifier for your data collection - many repositories use the DOI (Digital Object Identifier) system - which you can cite in your publications, presentations and on social media, bringing readers directly to your (meta)data online.
Practical things you can do to ensure your data are Accessible:
► Put your data in a trusted data repository that is free to use and available for anyone to search. Advice on finding a data repository is provided in the RCSI Research Data Management Guide on choosing a data repository.
► If you need to restrict access to the data, make sure to select a repository that can provide these controls.
► Make your metadata open access, even if the data cannot be shared openly.
By putting your data in a data repository, you allow the repository to take care of the technical aspects of ensuring the data can be accessed by anyone in the world using a standardised communications protocol, such as a URL address (more on standardised communication protocol). The repository can also continue to display the descriptive metadata, even if the data cannot be accessed yet due to an embargo, or where the data no longer exist due to a requirement to expire the data.
Practical things you can do to ensure your data are Interoperable:
► Store your data in commonly used file formats, and where possible use non-proprietary (open) formats. Advice on file formats is provided in the RCSI Research Data Management Guide on data collection.
► When you describe your data in the metadata, include words from controlled vocabularies, thesauri or ontologies that are more likely to be understood by your scientific community and picked up by search engines.
'Interoperability' describes how easy it is to open, unpack and understand the data, especially for those not involved in the creation of that data. Interoperability is usually an assessment of how easy it is for your data to be integrated with data from another source to carry out new research. This usually means that the system and language used to create your data are defined, widely understandable and compatible. While this may sound a little technical, a straightforward approach can be to prepare your data following the established norms of your discipline, for example statistical data might be prepared and stored as MS Excel (.xls) or Comma-separated value (.csv) files, with accompanying data codebooks.
Practical things you can do to make your data Reusable:
► Organise data files according to best practice in your domain so they make sense to other users. Advice on file organisation is available in the RCSI Research Data Management Guide on documentation and data quality.
► Provide clear provenance information on how, why and by whom the data were created and processed. Advice on creating data provenance information (documentation) is available in the RCSI Research Data Management Guide on documentation and data quality.
► Attach a clear user license to the data so that others can see if and how the data can be reused, including guidance on how to request access to restricted data. Advice on attaching a license to your data is available in the RCSI Research Data Management Guide on licensing your data.
The metadata that you provide via the data repository can be used to capture much of the provenance information and license information that a new user will need to figure out whether they can use your data. Many licenses for data and other research outputs, such as Creative Commons Licenses, are machine readable, which can really help a search engine to locate data that is suitable and ready to use in new research (more on licenses for data). It's really important to attach a license to your data, rather than no license at all, because it provides clarity on whether the data can be reused.
Once you have started to make your data available to others, you can check 'how FAIR are your data' using the below checklist that was developed by Jones & Grootveld (2017). This checklist is also a useful starting point for early in your research project to think about ways to produce data that is FAIR.
List of further resources to learn more about FAIR data:
Addressing the FAIR Data Principles in a Data Management Plan - A useful guide on how to integrate the FAIR data principles into your Data Management Plan from UCD Library.