Skip to Main Content

Research Data Management

How to describe your data preservation and sharing plan

At the outset of your study, even before any data collection has taken place, you should be planning what will happen to the data on completion of the study. For this reason, the DMP should address the following questions: 

  1. How will data for preservation be selected and where they be preserved? 
  2. How and when will data be shared? Will you need to restrict access to the data, or place an embargo on access to the data after you have deposited it in a data repository?​​​
  3. Will users need access to specific methods or software tools to re-use your data? 
  4. In order for the data to be findable, how will a unique and persistent identifier be attached to the data?

1. How will data for preservation be selected and where they be preserved?

RCSI recognizes research data as a valuable institutional asset. Research data must be retained and disposed of securely according to the relevant retention and disposal schedule, in accordance with legal, ethical and research funder requirements, and with particular concern for the confidentiality and security of the data.


RCSI policy on preserving research data: Researchers are responsible for providing access to research data requested by third parties as freely and timely as possible, unless access to the data is restricted for legitimate reasons, which should be stated in the metadata description or research article.


View the RCSI Research Data Management Policy in full.


Who is responsible for preserving the research data from a project? The RCSI Research Data Management Policy applies to all college members engaged in research, including staff and research students, and those who are conducting research on behalf of the College, irrespective of funding. Researchers have the primary responsibility for ensuring research data will be managed in line with funder requirements as well as College policy and other relevant regulations and legislation.

How long should I retain research data? Research data that underpins published results or is considered to have long-term value should be retained, subject to informed consent to do so, where relevant. The current RCSI REC guideline is that research data should be retained for 5-7 years and then destroyed. However, this retention time could be significantly less or more depending on the nature of the study being conducted.

The RCSI Research Data Management Policy states that in the absence of the other provisions, the default period for research data retention is 10 years from date of last requested access. Retained data must also be deposited in an appropriate national or international reputable data repository.

However, it is often advisable to retain research data/records for a longer period depending on the nature of the study and the data collected. For example, the Medical Research Council (UK) recommends the following retention schedule for various study designs.

  • For basic research: Research data and related material should be retained for a minimum of 10 years after the study has been completed.
  • For population health and clinical studies: Research data should be retained for 20 years after the study has been completed. 
  • For clinical studies: In some cases, such as for clinical studies involving pregnant participants and those who lack capacity to consent, it has been recommended that a minimum of 25 years may be more appropriate for data retention.

However, longer retention periods for both basic research and population health and clinical studies may be appropriate in some cases. For example:

  • For basic research – Retention periods of 10 years+ may be more appropriate where there is the potential for Intellectual Property to arise (e.g. laboratory notebooks could be retained indefinitely). Similarly, research data relating to studies which directly inform national policymaking should be considered for permanent preservation in an appropriate archive or repository.

Indicate where the data will be deposited. If no established repository is proposed, demonstrate in the DMP that the data can be curated effectively beyond the lifetime of the grant. It is recommended to demonstrate that the repositories policies and procedures (including any metadata standards, and costs involved) have been checked.


RCSI policy on preserving data in a data repository:

  • Retained data must be deposited in an appropriate national or international reputable data repository or as mandated by the funder. This may be specified by the funder or publisher.
  • When depositing research data into external data repositories, repositories that support Open Researcher and Contributor ID (ORCID) 
    should be chosen as far as is practical.
  • View the RCSI Research Data Management Policy in full

There are many benefits to putting your data in a data repository, and the repository can provide you with many of the following services: 

  • Persistent identifier (such as a Digital Object Identifier (DOI)) assigned to your data

  • Assistance with metadata, for example the data repository will usually provide recommendations or templates for creating metadata about your data

  • Licencing of your data for example the data repository will usually provide recommendations or options for selecting a data licence 
  • Long-term access to the data, in some cases, long-term preservation
  • Search and navigation tools, and sometimes visualisation tools for data, which can help with making your data findable.
  • Your data is more likely to reach a wide audience of new users from anywhere in the world  
  • If the data is required controlled access, some repositories can manage access requests on behalf of the owner of the data 
  • In other words, data repositories can help you to make your data more FAIR.   

When choosing a data repository, always start by looking for broadly recognised, discipline-specific or certified repository in your scientific field. If you cannot find such a repository, or if you're unsure of whether you've found a good home for your data, you can use the following assessment criteria, which we have adapted from Science Europe's Practical Guide to the International Alignment of Research Data Management - Extended Edition.

 

In certain cases publishers or funders may specify which data repository you must use to deposit your data. However in most cases you will have to identify a suitable home for your data.

 

There are several resources to help you locate a suitable data repository:

  • re3data.org The Registry of Research Data Repositories is a directory of more than 2,000 data repositories that meet established standards. recommended by Horizon Europe for locating an optimal repository for your data.
  • https://fairsharing.org/ FAIRSharing gathers details about repositories, which you can filter by subject, domain and taxonomy.
  • http://www.researchpipeline.com/ Research Pipeline is a privately-maintained list of repositories, including 140 disciplinary databases. This site is updated less often than the above.

As you review potential repositories, ask the following questions to assess their suitability:

  • Is it reputable? For example, is it listed in Re3data thereby meeting their conditions of inclusion?
  • Is it appropriate to my discipline?
  • Does it accept the type of data I want to deposit?
  • Is there a size limit on how much data I can deposit?
  • Is there a charge to deposit – even a one off fee?
  • Will it provide a persistent identifier for my data such as a DOI number?
  • Does it provide guidance to new users on how the data should be cited?
  • Does it provide access control for my research data?
  • Does it ensure the data will be preserved long term (for the foreseeable) or is there a time limit on the repository?
  • Does it provide expert help e.g. metadata provision, curation?

See also the RCSI guide on "Where to submit data" created in collaboration with the Consortium of National and University Librarians (CONUL) for more information. https://drive.google.com/file/d/1S8Qc3cDdfziDdwW5ACRA59y2FQuMjMsm/view

If you do not have a suitable, discipline-specific repository for your data you can deposit your data in a generalist data repository. This type of repository will accept a variety of data types and file types, and most have the facility to assign a persistent identifier (PID) to published data.

 

Examples of generalist data repositories:

Dryad Digital Repository The Dryad Digital Repository is a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad accepts data from any field and in any format, and has dedicated curators to check your files before they are released, and help you follow best practices.

Figshare Figshare is a repository where users can easily upload files up to 5GB to make all of their research outputs available in a citable, shareable and discoverable manner. Any file format is accepted and DOIs are provided. The RCSI Repository uses Figshare and all entries to the Repository are automatically included as part of Figshare, with a 25GB default storage limit. 

Zenodo Zenodo was built and is operated by CERN and OpenAIRE to ensure that everyone can join in Open Science. It welcomes research from all over the world, and from every discipline. Every upload is assigned a DOI, to make them citable and trackable.

If you research involved the use or development of new software, you should make the source code available on a Version Control System (VCS) such as GitHub or BitBucket. However, these sites do not support the preservation of your code nor citation, and you should upload a permanent, archived version of the source code to an approved repository. For example, GitHub is integrated with Zenodo, and Zenodo can provide a DOI registration for the archived source code.

2. How and when will data be shared?

Explain the foreseeable research uses (and/ or users) for the retained data

 

 

'Sensitive data' is data that must be protected against unwanted disclosure, for legal or ethical reasons, for issues pertaining to personal privacy, or for proprietary considerations. At RCSI, many of our research projects work with sensitive data.


If you are handling and dealing with sensitive data, keep in mind that special attention should be given to collecting, processing, handling and storing data throughout the research process. If you wish to make these data available at the end of the project then you will need to consider this when you are designing your study. In particular, when you are collecting data you will need to ensure you are asking for informed consent to share the data at the end of the project. This might limit your data sharing opportunities, however you can publish a description of your data (metadata) without making the data itself openly accessible, and you can place conditions around access to published data if necessary. Sensitive data that has been properly anonymised can be shared without breaching data protection regulations. 

Anonymisation irreversibly destroys any way of identifying the data subject. Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible. OpenAIRE provides researchers with a tool to anonymise data: Amnesia. The guide for which you can find here.

 

Pseudonymisation replaces any identifying characteristics of data with a pseudonym, a value which does not allow the data subject to be directly identified. The personal data can only be attributed to a specific data subject with the use of additional information, such as decryption key. This key should be kept separately, and be subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable individual. Pseudonymisation only provides limited protection for the identity of data subjects and in many cases as it still allows identification using indirect means. 

 

You must comply with Irish State Law, please see the Data Protection Commission's Guidance on Anonymisation and Pseudonymisation for more information. Both the Australian National Data Service (ANDS) guidelines on Publishing and Sharing Sensitive Data and the OpenAire guide on How to Deal with Sensitive Data provide further information on dealing with and sharing sensitive data. 

In your DMP you should indicate whether data will be shared via a repository, requests handled directly, or whether another mechanism will be used?

Sensitive and confidential data can be safeguarded by regulating or restricting access to and use of the data. Access controls should always be proportionate to the kind of data and level of confidentiality involved. When regulating access, consider who would be able to access your data, what they are able to do with it, whether any specific use restrictions are required, and for how long you want the data to be available. 


The three levels of data access, according to the UK Data Service, are:

Open Data: Data that can be accessed by any user for any reason, including commercial. Data in this category should not contain personal information unless consent is given.

Safeguarded Data: Data that contain no personal information, but the data owner considers there to be a risk of disclosure resulting from linkage to other data

Controlled Data: for data that may be disclosive. Data are generally only available to users through a relevant Data Access Committee, which may mandate training or other protective measures as appropriate. 

Additionally, most data repositories will allow you to place a temporary embargo on your data. During the embargo period, the description of the dataset is published, but not the actual data. The data themselves will become available to access after the embargo period ends.

Do you need exclusive use of the data while you finalise your publication? Will you need to embargo access to the data for a period of time? 

Sometimes there are legitimate reasons for not sharing some or all research data generated by a project. Funders who require data sharing will generally ask that researchers justify this decision in their Data Management Plan (DMP).

 

It is generally possible to choose not to share research data using the following criteria, which have been adapted from the European Commission Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. 


Some reasons why it might not be possible to share data include: 

  • Data are commercially sensitive

  • Data are confidential (due to third party obligation

  • Sharing data would break data protection regulations

  • Sharing would mean that the project's main aim might not be achieved

  • Data are generated under an industry funded or co-funded project

  • Sharing of the data may impact on future plans to protect intellectual property 

 

Please see the sections on Ethical Considerations and Data protection for further information on the limitations of sharing research data, and the importance of informed consent and ethical approval.  

What does it mean to license your data? In the DMP template you might see a question such as,

How will other legal issues, such as intellectual property rights and ownership, be managed? (Science Europe DMP template)

A license agreement is a legal arrangement between the creator/depositor of the data and the data repository, stating clear re-use rights to help others understand what they are allowed to do with your data. To make re-use as likely as possible it is recommended you to choose a licence which: 

  • Makes data available to the widest audience possible
  • Makes the widest range of uses possible

To answer this question you need to think about who is owner of the research data from your study (the PI? the funder? a consortium? a third-part organisation?) and whether you have the ownership rights to make the data available to others. If you can make the data available to others, will you need to restrict how third-parties can use this data? For example, maybe the data can only be used for non-commercial purposes. Maybe you'd like to be cited as the origin of that data every time someone uses the data in the future.  All of this can be clarified in the license agreement.


It is imperative that the intellectual property rights (IPR) pertaining to the data are established before any licensing takes place. If your research contains data from third parties (e.g., data from a health or hospital system) you should ensure you have the permission of the rights holder to share this data, or that the data is covered by licences that permit the sharing of data, before you put it in the data repository.

Creative Commons licenses are commonly applied to research data because 

  • a CC license gives you a way to grant others permission to use your data under copyright law, and
  • a CC license gives clarity to new users of the data what they are allowed to do with the data 

There are six different types of Creative Commons license, ranging from the most to least permissive. Creative Commons licenses allow the copyright holder to retain copyright ownership of their works while allowing others to use the work under certain conditions specified by the chosen licence. See a full description of these licences here

  • CC-BY Attribution
    Users can distribute, remix, tweak, and build upon a work, even commercially, as long as they give credit to the original creator of the work. 
  • CC-SA Share-Alike
    Users can remix, tweak, and build upon a work even for commercial purposes, as long as they credit the original creator and license any new creations under identical terms.  
  • CC BY-ND Attribution-NoDerivs
    Users can copy and redistribute the material in any medium or format for any purpose, even commercially, as long as it is passed along unchanged and credit is given to the original creator.
  • CC BY-NC Attribution-NonCommercial
    Users can copy and redistribute the material in any medium or format and remix, transform, and build upon the material but any new works must be non-commercial and give credit to the original creator.
  • CC BY-NC-SA Attribution-NonCommercial-ShareAlike
    Users can copy and redistribute the material in any medium or format and remix, transform, and build upon the material but any new works must be non-commercial, give credit to the original creator and be licensed under identical terms.
  • CC BY-NC-ND Attribution-NonCommercial-NoDerivs
    This licence is the most restrictive. Users can copy and redistribute the material, but they must credit the original creator and cannot change the work in any way or use it commercially.

An open-source licence is a set of conditions that grants the users of your software certain rights to use, copy, modify, and possibly redistribute the source code or content of the software. It also asserts your authorship. There are several licensing options for open source software, including: 

  • MIT License – permits any person to use, copy, modify, merge, publish distribute, sublicense, and/or sell copies of the software as long as a copy of the license notification is included with any reuse
  • GNU General Public License - users can copy, distribute, and modify the software as long as any modifications are also licensed under the GPL
  • Apache license 2.0  - allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software as long as a copy of the license is redistributed with any modified software

Additional information is available from the Software Sustainability Institute  and Open Source Initiative.

3. Will users need specific software to re-use the data? 

Specific software: If potential users of your research data would need access to specific tools to be able to reuse the data, you should indicate that in your DMP and provide sufficient details on what software and what version woudl be required. If the software is availabel for download, provide information on where they can access a copy. 


File formats: The ability to read your data in the future depends on the file format, so you are strongly encouraged to use standard, exchangeable or open file formats. You can store data in a proprietary format where it is the de facto format within your disciplinary area, or where the format is supported across a range of software (so you are not locked into one type of software). The go-to guidance on file formats is the Library of Congress (LOC) Recommended Format Statement which is updated each year, as this is a constantly evolving topic.

 

When choosing an electronic file format to create and store data, it's important to consider whether the format is open and/or ubiquitous. The format you use determines how accessible these data are to other users, as some files can only be opened when you have a license to use that software. File format also determine how accessible the data will be to yourself and others into the future - technology evolves quickly, and the software that you use today will become obsolete in time.

Why use open file formats? File formats that are open or non-proprietary will tend to remain accessible, even if the software that created them is no longer available. However, formats which are ubiquitous or have become the default standard within a discipline, whether proprietary or not, are also likely to be maintained into the future.

What if you have a preferred software? If you find it necessary or convenient to work with a proprietary format, it may be useful to store your data using that format for data collection and analysis, while also storing a copy in an open or accessible format for sharing or archiving once your project is complete.

Which format is best for FAIR data? Many data archives and repositories will already have recommended file formats based on best practice within the disciplines they support. 

When choosing a file format you should consider the following:

  • How you plan to analyse your data
  • Which software and file formats you and your colleagues have used in the past
  • Any discipline specific norms or technical standards
  • Whether file formats are at risk of obsolescence because of their dependence on a particular technology.
  • Which formats are best to use for the long-term preservation of data
  • Whether important information might be lost by converting between different formats


File formats likely to be accessible into the future (from DMPTool Guidance):

  • Non-proprietary
  • Open, with documented standards
  • In common usage by the research community
  • Using standard character encodings (i.e., ASCII, UTF-8)
  • Uncompressed (space permitting)

Examples of preferred format choices (from DMPTool Guidance):

  • Image: JPEG, JPG-2000, PNG, TIFF
  • Text: plain text (TXT), HTML, XML, PDF/A
  • Audio: AIFF, WAVE
  • Containers: TAR, GZIP, ZIP
  • Databases: prefer XML or CSV to native binary formats

For more information on recommended formats, see the UK Data Service guidance on recommended formats.

4. How will a unique and persistent identifier be attached to the data?

Indicate whether a persistent identifier (PID) will be pursued for the data. Typically, a trustworthy, long-term repository will provide a persistent identifier.

Persistent identifiers or PIDS are the backbone of the data citation. If someone wants to replicate your analysis they will need to be able to find the correct copy of the data that you used. By including a persistent identifier in your data citation you enable readers to identify and navigate to the exact version of the data that you used in your research. The persistent identifier is preferable to a less stable reference point such as a URL (website) address, as persistent identifiers are slow to expire and the data is more likely to be findable for many years. There are several types of persistent identifier used to identify datasets, but DOI numbers are most commonly used. Please find more information on DOIs below. 


 

A DOI number is a string of numbers, letters and symbols used to permanently identify an article or document and link to it on the web. DOIs are commonly used to identify a research data resource online, and their strength is that they provide a unique identifier for the file or collection of files and provide an easy way to locate these files online. They are superior to web address links (URLs) as while a web address (URL) might change, the DOI will never change, plus they tend to be shorter and easier to cite than a web address. 


Here's an example of what a DOI looks like in a data citation: 
Smith, J., and Jones, P.  (2023). Environmental risk factors for autism [dataset]. Dryad Digital Repository [distributor]. doi: 10.1234/abcd123


Many data repositories will assign a persistent identifier to your data once you publish the dataset (or metadata about the dataset) on their platform. For example on Zenodo data uploads are made available online as soon as you hit publish, and your DOI is registered within seconds

RCSI policy on data availability statement: 

A statement describing how and on what terms any supporting data may be accessed must be included in published research outputs.

View the RCSI Research Data Management Policy in full

A data availability statement is a short statement at the end of a research article that describes how, where, and under what conditions the data associated with the research article can be accessed. All research articles should include a data availability statement, even when there is no data associated with the article (more on this below) as this an important step in giving credit to data creators, and in supporting the reproducibility of research.


In journal publications, the data availability statement usually appears at the end of a journal article before the ‘references’ section. The author(s) of the article write the data availability statement, and you should always include this statement in your article prior to submission for publication.

The data availability statement provides clear information on where the data can be accessed, and whether access to the data is open or restricted in some way. It should also provide a digital reference or link to where the data can be found online. Statements to the effect of "data available from authors" or "data will be made available on request" are not acceptable as a data availability statement, as they do not provide sufficient information to genuinely enable access to the data.

 

A data citation is an entry for a dataset within the reference list of an article, book, conference proceeding, or other document. Data citations are captured by standard citation counting methods if they are included in the reference list. However it is unfortunately still common practice for researchers to not cite data correctly in their reference list, or not to include sufficient information on the source of their research data.

It's important to cite data in your publications, in just the same way you would articles, books, images and websites, as a dataset is a source of evidence to support your argument. The UK Data Service have provided a useful video summarising why it is important to cite data correctly


The UK Data Service highlights the following benefits of data citation to researchers and to science in general: 

Transparency: Citing data is a way of clearly showing exactly which version of which dataset has underpinned or influenced research, as well as crediting those who have made the work possible by collecting the data.

Reproducibility: It helps future researchers to find out which data the researcher has used and enable the research to be reproduced to assess its integrity. Louise Corti, Director of Collections Development and Data Publishing for the UK Data Service, has written a great blog about research reproducibility in qualitative research: Show Me the Data.

Helping track the use of the data: Researchers who [share data] want to know that the data is being used, just like any other researchers want to know that their book or article has been used to support others’ research. In addition, bodies that fund the collection of this data want to know that their funding has produced value. It can also help researchers in gaining further funding for future data collection and analysis. Susan Noble wrote a great post looking at finding out what people have done with data we provide and its impact.

Measuring impact: Researchers want their books, articles and data to be make a difference to others, whether this is on future research, influencing policy or positively changing the lives of individuals, communities or society. Citing data, like citing any other research helps [repositories] in measuring and reporting on this impact.

Source: Spotlight on #CiteTheData: Make the data count – Data Impact blog (ukdataservice.ac.uk)

According to the ICPSR, the elements of a data citation are: 

  • Author: Name(s) of each individual or organizational entity responsible for the creation of the dataset.
  • Date of Publication: Year the dataset was published or disseminated.
  • Title: Complete title of the dataset, including the edition or version number, if applicable.
  • Publisher and/or Distributor: Organizational entity that makes the dataset available by archiving, producing, publishing, and/or distributing the dataset.
  • Electronic Location or Identifier: Web address or unique, persistent, global identifier used to locate the dataset (such as a DOI). Append the date retrieved if the title and locator are not specific to the exact instance of the data you used.

These are the minimum elements required for dataset identification and retrieval. Fewer or additional elements may be requested by author guidelines or style manuals. Be sure to include as many elements as needed to precisely identify the dataset you have used.

Example of published dataset citation with an archive number: 

 

TILDA. (2019). The Irish Longitudinal study on Ageing (TILDA) Wave 4, 2016. [dataset]. Version 4.0. Irish Social Science Data Archive. SN:0053-05. www.ucd.ie/issda/data/tilda/wave3 

 

Example of published dataset citation with a DOI number: 

 

Smith, J., and Jones, P.  (2023). Environmental risk factors for autism [dataset]. Dryad Digital Repository [distributor]. doi: 10.1234/abcd123

 

Example of an unpublished

dataset: 

 

Smith, J., and Jones, P.  (2023). Environmental risk factors for autism [unpublished raw dataset]. Royal College of Surgeons in Ireland. 

 

Example of published dataset citation from an organisation or research group: 

 

Health Service Executive. (2019). General Referrals by Hospital, Department and Year 2019. [dataset]. HSE Open Data [distributor]. 

 

Example of published dataset from individual authors: 

 

Smith, J., and Jones, P.  (2023). Environmental risk factors for autism [dataset]. Dryad Digital Repository [distributor]. doi: 10.1234/abcd123

 


For a deep dive into Data Citation see: Ball, A. & Duke, M. (2015). ‘How to Cite Datasets and Link to Publications’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: /resources/how-guides

You should include the following three pieces of information in your data availability statement:

  • Location of the data: If your study involved collecting or producing new data, you should upload this data to a suitable online data repository. All of the data should be stored together as a single dataset, ideally in a domain-specific repository for your area of research. In your data availability statement, you then name the repository where the data is located. If your study involved re-using data that was collected or produced by a third party, you should provide information on where this data can be accessed.
  • Identifier for the data: Ideally, you should provide a persistent identifier (PID) which is a long-lasting digital reference to a document, file, web page, or other object online, and is more stable than a URL. When you provide a persistent identifier, such as DOI number, it is much easier for the reader to locate your data online. Usually once you upload your data to data repository and hit the 'publish' button, a unique and persistent identifier is assigned to the dataset. It's important to include a persistent identifier in your data availability statement, as this helps the reader find the exact dataset you're referring to.
  • License information: It's important to apply a license to your research data, as this makes it clear what somebody else can do with this data. Data repositories often prompt you to choose from a range of Creative Commons license options. For example, if you to enable others to use, adapt, or build on your work, while giving you appropriate credit for the data, then you might apply a Creative Commons Attribution (CC-BY) license. If you want to enable others to use your data, but don't want it to be used commercially, you might apply a Creative Commons Non Commercial (CC BY-NC) license. For the full list of options for licensing data, see the Creative Commons license options.

Use the following examples to guide you in constructing a data availability statement. Rememember to include at a minimum the following three pieces of information: 

  1. Location of the data
  2. Identifier for the data
  3. License information:

How accessible are the data? What to say in your data availability statement:  Example text: 
Data are openly accessible in data repository.  The data that support the findings of this study are openly available in [insert repository name] at http://doi.org/ [insert DOI number], dataset reference number [insert reference number].
 

Example 1: The data that support the findings of this study are openly available in Zenodo.org at 10.5281/zenodo.3723939 under the terms of the Creative Commons Attribution 4.0 (CC-BY 4.0) license. 


Example 2: Repository: An atom-efficient, single-source precursor route to plasmonic CuS quantum dots. https://doi.org/10.5256/repository.4591.d34639. Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication). 

Data are openly available in a repository that does not issue DOIs. The data that support the findings of this study are openly available in [insert repository name] at [insert URL], reference number [insert reference number assigned to this dataset by the repository].
 

Example 1: The data that support the findings of this study are openly available in GEO DataSets at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE68849, GEO accession number GDS5660. Data are available under the terms of the Creative Commons Attribution 4.0 (CC-BY 4.0) license.

Example 2: NCBI Gene: Ihe1 intestinal helminth expulsion 1 [Mus musculus (house mouse)]. Accession number 107537. Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Data are derived from public domain resources.

The data that support the findings of this study are available in [insert repository name] at [insert URL or DOI], reference number [insert reference number].     
 

Example: The datasets that support the findings of this study are openly available in Data.gov.ie under the terms of the Creative Commons Attribution 4.0 (CC-BY 4.0) license at the following locations:

COVID-19 HSE Weekly Booster Vaccination Figures: https://data.gov.ie/dataset/covid-19-hse-weekly-booster-vaccination-figures2?package_type=dataset

Pobal HP - Deprivation Index Scores - 2016: https://data.gov.ie/dataset/hp-deprivation-index-scores-2016/resource/6480bb69-023c-47f2-813f-8689bacafa54

Data were generated at a central, large-scale facility, available upon request.  Raw data were generated at [insert facility name]. Derived data supporting the findings of this study are available from [describe procedure for applying for access to the data].    
 

Example: Raw data were generated at FutureNeuro at RCSI and Trinity College Dublin. Derived data supporting the findings of this study are available from the corresponding author [G.C.] on request. 

Data are not publicly available, but available to researchers with appropriate credentials in line with consent agreed with respondents.  
   

Due to confidentiality agreements, access to the data that support the findings of this study is restricted to bona fide researchers and is subject to a non-disclosure agreement. Details of the data and how to request access are available from [insert repository where data reside / name of data manager at host institution].  

Example: The Anonymised Microdata Files (AMF) for the Growing Up in Ireland Child Cohort (9 years) data is available via the Irish Social Science Data Archive, ISSDA for bona fide research purposes only and is subject to an end user agreement. Details of the data and how to request access are available at https://www.ucd.ie/issda/data/growingupinirelandgui/  
Data are not publicly available to protect anonymity of participants, although some controlled access is allowed.     
 

The data that support the findings of this study are not publicly available due to [describe reason for access restriction, and procedure for applying for access to the data and the conditions under which access will be granted].

Example: The data that support the findings of this study are not publicly available due to restrictions outlined in consent agreements with participants and the identifying nature of the data. Data can be made available upon reasonable request and in line with the consent agreed with participants, by contacting the authors [C.G. and P. O'H.] 

Data are not publicly available but is available on request, due to privacy/ethical restrictions.     

The data that support the findings of this study are not publicly available due to [describe reason for non-sharing of data]. Example: Given the sensitive and identifying nature of the data, and in line with the consent agreed with participants, the data that support the findings of this study are not publicly available.  
Data are currently embargoed due to commercial restrictions (e.g. to allow time for commercialization).     
  

 
The data that support the findings will be available in [repository name] at [URL / DOI link] following a [6 month] embargo from the date of publication to allow for commercialization of research findings. Example: The data that support the findings of this study will be available in Zenodo.org at at 10.5281/zenodo.3723939 from early 2023, following a 6 month embargo from the date of completion of the study, to allow for commercialization of research findings. 
Data are restricted by commercial, industry, patent, government policies, regulations, or laws.      
 


 

Due to the nature of the research, due to [ethical/legal/commercial] supporting data is not available. [If known, describe procedure for applying for access to the data and the conditions under which access will be granted.]

  

Example: Due to commercial restrictions, the Drug Distribution Dataset used in this study is not publicly available. Access to the data can be requested by completing the Data Request form at www.allianceheathcaresample.com/data.

Data are available within the article or its supplementary materials.    
    

The authors confirm that the data supporting the findings of this study are available within the article [and/or] its supplementary materials.

Example 1: The data supporting the findings of this study are available in the supplementary material (Appendix A) of this article. 


Example 2: All data underlying the results are available as part of the article and no additional source data are required.

Data are subject to third party restrictions.     
 

The data that support the findings of this study are available from [third party]. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from [the authors / at URL] [describe procedure you used to access the data]   

   

Example: The Health data from the Quarterly National Household Survey Q3-2010 are made available by the Central Statistics Office. Restrictions apply to the availability of QNHS data, which were used under license for this study. Data are available from  the Irish Social Science Data Archive at https://www.ucd.ie/issda/data/qnhsmodules/, ISSDA study number 00041-00. Access can be requested by completing an ISSDA Data Request Form for Research.  
 
Publication did not use any data.   

It's important to include this information, even if there is no data underpinning the article, for clarity

Example 1: No data was used for the research described in the article. 

Example 2: No data are associated with this article.

For advice on constructing the data availability statement for data types that are commonly used in the health sciences (e.g., 3D-printable models, chemical and macromolecular structures, neuroimaging data, sequence and 'omics data) please view the author guidance from Health Open Research: https://healthopenresearch.org/for-authors/data-guidelines

If you research involved the use or development of new software, you should include a software availability statement. Your software availability statement should include the name of the repository where the source code at the time of publication (the archived version) is available, a DOI number for the archived software, and details of the license under which the software can be used. You should use an Open Source License (OSI) if possible, which allows software to be freely used, modified, and shared.

Putting it all together

Now that you have reached the conclusion of your research study, to ensure your data are FAIR, you have: 

 Published your data in a repository / archive which has provided an identifier for the published data. 


 Added rich metadata about the data to the repository / archive.

 

Attached a license to your data, so it is clear how a new user can use the data in a new work


 Clearly explained any access restriction in the metadata and given clear guidance on how to request access.


 Provided a data citation in the metadata, including the important identifier and used this citation in your publications. 


 Provided a data availability statement in all of your publications. 


Together, your data citation and data availability statement should look something like this: 
Smith, J., and Jones, P.  (2023). Environmental risk factors for autism [dataset]. Dryad Digital Repository [distributor]. doi: 10.1234/abcd123. The data that support the findings of this study are openly available in Dryad Digital Repository at doi: 10.1234/abcd123 under the terms of the Creative Commons Attribution 4.0 (CC-BY 4.0) license.  

Further resources

File formats for preservation

  • Library of Congress Recommended Formats Statement The Library of Congress identified preferred and acceptable file formats for textual works and musical compositions, still image works, audio works, moving image works, software and electronic gaming and learning, datasets/databases and websites.
  • UK Data Service Recommended Formats Guidance on file formats recommended and accepted by the UK Data Service for data sharing, reuse and preservation.
  • UCD Digital Library Preferred Formats for Data Preferred formats identified by the UCD Digital Library and Repository which facilitate processing, storage, and dissemination of data, assuring both useability and longer-term durability of the data.

Licensing data

There are several free-to-use tools online to help you find a suitable license for your research data and/or software.