Skip to Main Content

Research Data Management

Data Sharing and Long-term Preservation

According to Science Europe, when developing a data management plan, the fifth topic researchers are required to address is "Data sharing and long-term preservation", which broadly encompasses four main questions:


 How and when will data be shared? Are there possible restrictions to data sharing or embargo reasons? 

  • Explain how the data will be discoverable and shared (e.g. deposition in a trustworthy data repository).
  • Outline the plan for data preservation and give information on how long the data will be retained.
  • Explain when the data will be made available. Explain whether exclusive use of the data will be claimed and if so, why and for how long. Indicate whether data sharing will be postponed or restricted, for example to publish or seek patents.
  • Indicate who will be able to use the data. Explain what action will be taken to overcome or to minimise restrictions. 

 How will data for preservation be selected, and where data will be preserved long-term (e.g. a data repository)?

  • Indicate what data must be retained/destroyed for contractual, regulatory or legal purposes. 
  • Explain the foreseeable research uses (and/ or users) for the retained data.
  • Indicate where the data will be deposited. If no established repository is proposed, demonstrate in the DMP that the data can be curated effectively beyond the lifetime of the grant. It is recommended to demonstrate that the repositories policies and procedures (including any metadata standards, and costs involved) have been checked.

 What methods or software tools are needed to access and use data?

  • Indicate whether potential users need specific tools to re-use the data. 
  • Indicate whether data will be shared via a repository, requests handled directly, or whether another mechanism will be used?

 How will the application of a unique and persistent identifier (e.g. DOI) to each data set be ensured?

  • Indicate whether a persistent identifier (PID) will be pursued for the data. Typically, a trustworthy, long-term repository will provide a persistent identifier.

All researchers should familiarise themselves with the RCSI Research Data Management Policy. RCSI recognises research data as a valuable institutional asset, and the role of research data management in underpinning research excellence and integrity. The RCSI Research Data Management Policy applies to all College members engaged in research, including staff and research students, and those who are conducting research on behalf of the College, irrespective of funding. Researchers have the primary responsibility for ensuring research data will be managed in line with funder requirements as well as College policy and other relevant regulations and legislation. In relation to data sharing and long-term preservation, the Policy states:


  • "Researchers are responsible for providing access to research data requested by third parties as freely and timely as possible, unless access to the data is restricted for legitimate reasons, which should be stated in the metadata description or research article."

  • "Retained data must be deposited in an appropriate national or international reputable data repository or as mandated by the funder. This may be specified by the funder or publisher." 
  • "When depositing research data into external data repositories, repositories that support Open Researcher and Contributor ID (ORCID) should be chosen as far as is practical."
  •  "A statement describing how and on what terms any supporting data may be accessed must be included in published research outputs."

 

How to Share Data


There are numerous reasons why you might want to share your research data, including compliance, transparency, collaboration and efficiency. However, how you intend to share your data needs to considered from the start, while you are planning your project. You will need to think critically about how your data can be shared, what might limit or prevent data data sharing (such as informed consent, confidentiality concerns and legal reasons), and whether there are any steps that can be taken to remove such limitations (such as anonymisation of data). It is highly recommended that data be submitted to a discipline specific, community-recognised repository wherever possible, or to a multidisciplinary repository if no suitable discipline specific repository is available, however the options for data sharing also include:

  • Data are managed by and available upon reasonable request from the original Researcher or Research Group
  • Data are included with a published article, usually as Supplementary Information files
  • Data are published in a Data Journal, such as Scientific Data
  • Data are available from a Data Archive or Repository, with conditions around access
  • Data are openly available from a Data Archive or Repository (preferred)

 

Sharing Sensitive Data


If you are handling and dealing with sensitive data, keep in mind that special attention should be given to collecting, processing, handling and storing data throughout the research process. If you wish to make these data available at the end of the project then you will need to consider this when you are designing your study. In particular, when you are collecting data you will need to ensure you are asking for informed consent to share the data at the end of the project. This might limit your data sharing opportunities, however you can publish a description of your data (metadata) without making the data itself openly accessible, and you can place conditions around access to published data if necessary. Sensitive data that has been properly anonymised can be shared without breaching data protection regulations. 


Anonymisation

Anonymisation irreversibly destroys any way of identifying the data subject. Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible. OpenAIRE provides researchers with a tool to anonymise data: Amnesia. The guide for which you can find here.

Pseudonymisation

Pseudonymisation replaces any identifying characteristics of data with a pseudonym, a value which does not allow the data subject to be directly identified. The personal data can only be attributed to a specific data subject with the use of additional information, such as decryption key. This key should be kept separately, and be subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable individual. Pseudonymisation only provides limited protection for the identity of data subjects and in many cases as it still allows identification using indirect means. 


You must comply with Irish State Law, please see the Data Protection Commission's Guidance on Anonymisation and Pseudonymisation for more information. Both the Australian National Data Service (ANDS) guidelines on Publishing and Sharing Sensitive Data and the OpenAire guide on How to Deal with Sensitive Data provide further information on dealing with and sharing sensitive data. 

 

Access Control


Sensitive and confidential data can be safeguarded by regulating or restricting access to and use of the data. Access controls should always be proportionate to the kind of data and level of confidentiality involved. When regulating access, consider who would be able to access your data, what they are able to do with it, whether any specific use restrictions are required, and for how long you want the data to be available. The three levels of data access, according to the UK Data Service, are:

 

  • Open Data: Data that can be accessed by any user for any reason, including commercial. Data in this category should not contain personal information unless consent is given.
  • Safeguarded Data: Data that contain no personal information, but the data owner considers there to be a risk of disclosure resulting from linkage to other data
  • Controlled Data: for data that may be disclosive. Data are generally only available to users through a relevant Data Access Committee, which may mandate training or other protective measures as appropriate. 

 

Additionally, most data repositories will allow you to place a temporary embargo on your data. During the embargo period, the description of the dataset is published, but not the actual data. The data themselves will become available to access after the embargo period ends.

 

Benefits of Data Repositories


A Data Repository allows researchers to upload and publish their data, thereby making the data available for other researchers to re-use. Similarly, a Data Archive allows users to deposit and publish data but will generally offer greater levels of curation to community standards, have specific guidelines on what data can be deposited and is more likely to offer long-term preservation as a service. Sometimes the terms data repositories and data archives are used interchangeably. Data Repositories and Archives can include institutional data repositories, general purpose or multidisciplinary repositories, or discipline specific data repositories. The RCSI Repository is currently in the process of extending its use from primarily publications, to also include research data. However, discipline specific repositories are often much more suitable for your research data management needs. The services provided by a Data Repository or Archive include:

 

  • Persistent identifier (such as a Digital Object Identifier (DOI))
  • Assistance with metadata provision (e.g. providing templates)
  • Allow you to apply a licence to your data
  • Aid compliance with FAIR data principles
  • Long-term access and, in some cases, long-term preservation
  • Offer useful search, navigation and visualisation functionality
  • Reach a wider audience of potential users
  • Manage requests for data on your behalf

 

Choosing a Data Repository


In certain cases publishers or funders may specify which data repository you will use, however if no data repository is specified, you should should ask yourself the following questions when choosing one:

 

Is it reputable? Is it listed in Re3data thereby meeting their conditions of inclusion?

Is it appropriate to my discipline?

Will it take the data you want to deposit?

Is there a size limit?

Does it provide a DOI/persistent identifier?

Does it provide guidance on how the data should be cited?

Does it provide access control for your research data?

Does it ensure long-term preservation/ curation?

Does it provide expert help e.g. metadata provision, curation?

Is there a charge?

 

Please see the re3data Registry of Research Data Repositories for more information and to find a suitable data repository. Re3data is a directory of more than 2,000 data repositories that meet established standards. Re3data promotes Science Europe’s minimum requirements for research data repositories. The Science Europe Core Requirements for Data Management Plans provides further guidance on choosing a trustworthy repository to meet the minimum specified criteria. Please see the RCSI guide on "Where to submit data" created in collaboration with the Consortium of National and University Librarians (CONUL) for more information.

 

Multidisciplinary Data Repositories


A general purpose repository can be used for data preservation if no discipline-specific repository exists. These can handle a variety of different data and file types, and will often assign a Persistent Identifier (PID) to your data. Although charges may apply, these can be included in funding applications. Examples of multidisciplinary data repositories include:

 

Persistent Identifiers


To become citable, your data should  include a unique, long-lasting, reference persistent identifier (PID). Most Data Repositories will automatically assign a persistent identifier to your dataHaving a PID is an important aspect of making sure your data meets the F (Findability) and A (Accessibility) in FAIR data management. Digital Object Identifiers (DOIs) are probably the most commonly used PIDs for research data, however other active persistent identifier schemes include:

 

Data Licensing


When you publish your data in a data repository, a licence agreement will be applied to your data. A licence agreement is a legal arrangement between the creator/depositor of the data and the data repository, stating clear re-use rights to help others understand what they are allowed to do with your data. To make re-use as likely as possible it is recommended you to choose a licence which:

  • Makes data available to the widest audience possible
  • Makes the widest range of uses possible

Please see Creative Commons Licences for more information about the types of licences available to you and their different attributions. The Digital Curation Centre (DCC) has also created a guide on "How to License Research Data"

 

Citing Research Data


Data should be considered legitimate, citable products of research. Citing research data not only provides a structured way to recognise and reward data creators, but it also makes data easier to find and promotes the validation of research results. As a result, data citations should be accorded the same importance as citations of other research objects, such as publications. Similar to publications, datasets should be cited using persistent identifiers such as digital object identifiers (DOIs), which, unlike standard web links, allow permanent linkage to the digital object. 

According to the Digital Curation Centre (DCC), a comprehensive data citation should include:

  • Author
  • Publication date
  • Title
  • Edition
  • Version
  • Resource Type
  • Publisher
  • Identifier
  • Location

Note that the way in which these elements would be styled and combined together in the finished citation depends on the citation style being used. Example: 

Cool, H. E. M., & Bell, M. (2011). Excavations at St Peter’s Church, Barton-upon-Humber [Data set]. doi:10.5284/1000389

Closed Data

When Not to Share Data


Sometimes there are legitimate reasons for not sharing some or all research data generated by a project. Funders who require data sharing will generally ask that researchers justify this decision in their Data Management Plan (DMP). It is generally possible to choose not to share research data using the following criteria, which have been adapted from the European Commission Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Researchers should also be congnisant of the obligations to commercialise their research or of their obligations to industry collaborators.

  • Data are commercially sensitive
  • Data are confidential (due to third party obligations)
  • Sharing data would break data protection regulations
  • Sharing would mean that the project's main aim might not be achieved
  • Data are generated under an industry funded or co-funded project
  • Sharing of the data may impact on future plans to protect intellectual property 

Please see the sections on Ethical Considerations and Data protection for further information on the limitations of sharing research data, and the importance of informed consent and ethical approval.