Skip to Main Content

Research Data Management

Introduction to preservation

According to Science Europe, when developing a data management plan, the fifth topic researchers are required to address is "Data sharing and long-term preservation", which broadly encompasses four main questions:


1. How and when will data be shared? Are there possible restrictions to data sharing or embargo reasons? 

  • Explain how the data will be discoverable and shared (e.g. deposition in a trustworthy data repository).
  • Outline the plan for data preservation and give information on how long the data will be retained.
  • Explain when the data will be made available. Explain whether exclusive use of the data will be claimed and if so, why and for how long. Indicate whether data sharing will be postponed or restricted, for example to publish or seek patents.
  • Indicate who will be able to use the data. Explain what action will be taken to overcome or to minimise restrictions. 

2. How will data for preservation be selected, and where data will be preserved long-term (e.g. a data repository)?

  • Indicate what data must be retained/destroyed for contractual, regulatory or legal purposes. 
  • Explain the foreseeable research uses (and/ or users) for the retained data.
  • Indicate where the data will be deposited. If no established repository is proposed, demonstrate in the DMP that the data can be curated effectively beyond the lifetime of the grant. It is recommended to demonstrate that the repositories policies and procedures (including any metadata standards, and costs involved) have been checked.


3. What methods or software tools are needed to access and use data?

  • Indicate whether potential users need specific tools to re-use the data. 
  • Indicate whether data will be shared via a repository, requests handled directly, or whether another mechanism will be used?


4. How will the application of a unique and persistent identifier (e.g. DOI) to each data set be ensured?

  • Indicate whether a persistent identifier (PID) will be pursued for the data. Typically, a trustworthy, long-term repository will provide a persistent identifier.

Preserving your data

All researchers should familiarise themselves with the RCSI Research Data Management Policy

RCSI recognises research data as a valuable institutional asset, and the role of research data management in underpinning research excellence and integrity. The RCSI Research Data Management Policy applies to all College members engaged in research, including staff and research students, and those who are conducting research on behalf of the College, irrespective of funding. Researchers have the primary responsibility for ensuring research data will be managed in line with funder requirements as well as College policy and other relevant regulations and legislation.

In relation to data sharing and long-term preservation, the Policy states:

  • "Researchers are responsible for providing access to research data requested by third parties as freely and timely as possible, unless access to the data is restricted for legitimate reasons, which should be stated in the metadata description or research article."
  • "Retained data must be deposited in an appropriate national or international reputable data repository or as mandated by the funder. This may be specified by the funder or publisher."
  • "When depositing research data into external data repositories, repositories that support Open Researcher and Contributor ID (ORCID) should be chosen as far as is practical."
  • "A statement describing how and on what terms any supporting data may be accessed must be included in published research outputs."

Sharing your data

There are numerous reasons why you might want to share your research data, including compliance, transparency, collaboration and efficiency. However, how you intend to share your data needs to considered from the start, while you are planning your project. You will need to think critically about how your data can be shared, what might limit or prevent data data sharing (such as informed consent, confidentiality concerns and legal reasons), and whether there are any steps that can be taken to remove such limitations (such as anonymisation of data). It is highly recommended that data be submitted to a discipline specific, community-recognised repository wherever possible, or to a multidisciplinary repository if no suitable discipline specific repository is available, however the options for data sharing also include:

  • Data are managed by and available upon reasonable request from the original Researcher or Research Group
  • Data are included with a published article, usually as Supplementary Information files
  • Data are published in a Data Journal, such as Scientific Data
  • Data are available from a Data Archive or Repository, with conditions around access
  • Data are openly available from a Data Archive or Repository (preferred)

If you are handling and dealing with sensitive data, keep in mind that special attention should be given to collecting, processing, handling and storing data throughout the research process. If you wish to make these data available at the end of the project then you will need to consider this when you are designing your study. In particular, when you are collecting data you will need to ensure you are asking for informed consent to share the data at the end of the project. This might limit your data sharing opportunities, however you can publish a description of your data (metadata) without making the data itself openly accessible, and you can place conditions around access to published data if necessary. Sensitive data that has been properly anonymised can be shared without breaching data protection regulations. 

 

Anonymisation irreversibly destroys any way of identifying the data subject. Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible. OpenAIRE provides researchers with a tool to anonymise data: Amnesia. The guide for which you can find here.

 

Pseudonymisation replaces any identifying characteristics of data with a pseudonym, a value which does not allow the data subject to be directly identified. The personal data can only be attributed to a specific data subject with the use of additional information, such as decryption key. This key should be kept separately, and be subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable individual. Pseudonymisation only provides limited protection for the identity of data subjects and in many cases as it still allows identification using indirect means. 

 

You must comply with Irish State Law, please see the Data Protection Commission's Guidance on Anonymisation and Pseudonymisation for more information. Both the Australian National Data Service (ANDS) guidelines on Publishing and Sharing Sensitive Data and the OpenAire guide on How to Deal with Sensitive Data provide further information on dealing with and sharing sensitive data. 

Controlled access to sensitive data

Sensitive and confidential data can be safeguarded by regulating or restricting access to and use of the data. Access controls should always be proportionate to the kind of data and level of confidentiality involved. When regulating access, consider who would be able to access your data, what they are able to do with it, whether any specific use restrictions are required, and for how long you want the data to be available. The three levels of data access, according to the UK Data Service, are:

Open Data: Data that can be accessed by any user for any reason, including commercial. Data in this category should not contain personal information unless consent is given.

Safeguarded Data: Data that contain no personal information, but the data owner considers there to be a risk of disclosure resulting from linkage to other data

Controlled Data: for data that may be disclosive. Data are generally only available to users through a relevant Data Access Committee, which may mandate training or other protective measures as appropriate. 

Additionally, most data repositories will allow you to place a temporary embargo on your data. During the embargo period, the description of the dataset is published, but not the actual data. The data themselves will become available to access after the embargo period ends.

Sometimes there are legitimate reasons for not sharing some or all research data generated by a project. Funders who require data sharing will generally ask that researchers justify this decision in their Data Management Plan (DMP). It is generally possible to choose not to share research data using the following criteria, which have been adapted from the European Commission Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Some reasons why it might not be possible to share data include: 

  • Data are commercially sensitive

  • Data are confidential (due to third party obligation

  • Sharing data would break data protection regulations

  • Sharing would mean that the project's main aim might not be achieved

  • Data are generated under an industry funded or co-funded project

  • Sharing of the data may impact on future plans to protect intellectual property 

Please see the sections on Ethical Considerations and Data protection for further information on the limitations of sharing research data, and the importance of informed consent and ethical approval.  

 

Data preservation at RCSI: Data that needs to be stored for long retention periods, and cannot be put into a data repository (i.e. due to sensitivity) can be stored by the Research IT Service using MS Azure Cloud storage, subject to the relevant Records Retention Schedule. The schedule should define a date for data disposal. This is a 'cold storage' option whereby data are archived and thus not accessible externally, and the metadata are stored in the on-premises system at RCSI. Where necessary, data can be retrieved via the Research IT Service at an additional cost. This option can only support FAIR data through a request-for-copy process, not a direct download process. The Isilon storage at RCSI has the capability to copy files to specific storage in MS Azure, if a request is made to the Research IT Service to do so. This is recommended if data are especially valuable, for sequence data that would be costly to recreate.