According to Science Europe, when developing a data management plan, the fifth topic researchers are required to address is "Data sharing and long-term preservation", which broadly encompasses four main questions:
How and when will data be shared? Are there possible restrictions to data sharing or embargo reasons?
How will data for preservation be selected, and where data will be preserved long-term (e.g. a data repository)?
What methods or software tools are needed to access and use data?
How will the application of a unique and persistent identifier (e.g. DOI) to each data set be ensured?
All researchers should familiarise themselves with the RCSI Research Data Management Policy.
RCSI recognises research data as a valuable institutional asset, and the role of research data management in underpinning research excellence and integrity. The RCSI Research Data Management Policy applies to all College members engaged in research, including staff and research students, and those who are conducting research on behalf of the College, irrespective of funding. Researchers have the primary responsibility for ensuring research data will be managed in line with funder requirements as well as College policy and other relevant regulations and legislation.
In relation to data sharing and long-term preservation, the Policy states:
There are numerous reasons why you might want to share your research data, including compliance, transparency, collaboration and efficiency. However, how you intend to share your data needs to considered from the start, while you are planning your project. You will need to think critically about how your data can be shared, what might limit or prevent data data sharing (such as informed consent, confidentiality concerns and legal reasons), and whether there are any steps that can be taken to remove such limitations (such as anonymisation of data). It is highly recommended that data be submitted to a discipline specific, community-recognised repository wherever possible, or to a multidisciplinary repository if no suitable discipline specific repository is available, however the options for data sharing also include:
If you are handling and dealing with sensitive data, keep in mind that special attention should be given to collecting, processing, handling and storing data throughout the research process. If you wish to make these data available at the end of the project then you will need to consider this when you are designing your study. In particular, when you are collecting data you will need to ensure you are asking for informed consent to share the data at the end of the project. This might limit your data sharing opportunities, however you can publish a description of your data (metadata) without making the data itself openly accessible, and you can place conditions around access to published data if necessary. Sensitive data that has been properly anonymised can be shared without breaching data protection regulations.
Anonymisation
Anonymisation irreversibly destroys any way of identifying the data subject. Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible. OpenAIRE provides researchers with a tool to anonymise data: Amnesia. The guide for which you can find here.
Pseudonymisation
Pseudonymisation replaces any identifying characteristics of data with a pseudonym, a value which does not allow the data subject to be directly identified. The personal data can only be attributed to a specific data subject with the use of additional information, such as decryption key. This key should be kept separately, and be subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable individual. Pseudonymisation only provides limited protection for the identity of data subjects and in many cases as it still allows identification using indirect means.
You must comply with Irish State Law, please see the Data Protection Commission's Guidance on Anonymisation and Pseudonymisation for more information. Both the Australian National Data Service (ANDS) guidelines on Publishing and Sharing Sensitive Data and the OpenAire guide on How to Deal with Sensitive Data provide further information on dealing with and sharing sensitive data.
Sensitive and confidential data can be safeguarded by regulating or restricting access to and use of the data. Access controls should always be proportionate to the kind of data and level of confidentiality involved. When regulating access, consider who would be able to access your data, what they are able to do with it, whether any specific use restrictions are required, and for how long you want the data to be available. The three levels of data access, according to the UK Data Service, are:
Additionally, most data repositories will allow you to place a temporary embargo on your data. During the embargo period, the description of the dataset is published, but not the actual data. The data themselves will become available to access after the embargo period ends.
A data repository allows researchers to upload and publish their data, thereby making the data available for other researchers to re-use. A data archive allows users to deposit and publish data but will generally offer greater levels of curation to community standards, have specific guidelines on what data can be deposited and is more likely to offer long-term preservation as a service. Sometimes the terms data repositories and data archives are used interchangeably. Data Repositories and Archives can include institutional data repositories, general purpose or multidisciplinary repositories, or discipline specific data repositories. The RCSI Repository is currently in the process of extending its use from primarily publications, to also include research data. However, discipline specific repositories are often much more suitable for your research data management needs. The services provided by a Data Repository or Archive include:
In certain cases publishers or funders may specify which data repository you will use, however if no data repository is specified, you should should ask yourself the following questions when choosing one:
Is it reputable? Is it listed in Re3data thereby meeting their conditions of inclusion?
Is it appropriate to my discipline?
Will it take the data you want to deposit?
Is there a size limit?
Does it provide a DOI/persistent identifier?
Does it provide guidance on how the data should be cited?
Does it provide access control for your research data?
Does it ensure long-term preservation/ curation?
Does it provide expert help e.g. metadata provision, curation?
Is there a charge?
Please see the re3data Registry of Research Data Repositories for more information and to find a suitable data repository. Re3data is a directory of more than 2,000 data repositories that meet established standards. Re3data promotes Science Europe’s minimum requirements for research data repositories. The Science Europe Core Requirements for Data Management Plans provides further guidance on choosing a trustworthy repository to meet the minimum specified criteria. Please see the RCSI guide on "Where to submit data" created in collaboration with the Consortium of National and University Librarians (CONUL) for more information.
A general purpose repository can be used for data preservation if no discipline-specific repository exists. These can handle a variety of different data and file types, and will often assign a Persistent Identifier (PID) to your data. Although charges may apply, these can be included in funding applications. Examples of multidisciplinary data repositories include:
The Dryad Digital Repository is a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad accepts data from any field and in any format, and has dedicated curators to check your files before they are released, and help you follow best practices.
Figshare is a repository where users can easily upload files up to 5GB to make all of their research outputs available in a citable, shareable and discoverable manner. Any file format is accepted and DOIs are provided. The RCSI Repository uses Figshare and all entries to the Repository are automatically included as part of Figshare, with a 25GB default storage limit.
Zenodo was built and is operated by CERN and OpenAIRE to ensure that everyone can join in Open Science. It welcomes research from all over the world, and from every discipline. Every upload is assigned a DOI, to make them citable and trackable.
To become citable, your data should include a unique, long-lasting, reference persistent identifier (PID). Most Data Repositories will automatically assign a persistent identifier to your data. Having a PID is an important aspect of making sure your data meets the F (Findability) and A (Accessibility) in FAIR data management. Digital Object Identifiers (DOIs) are probably the most commonly used PIDs for research data, however other active persistent identifier schemes include:
When you publish your data in a data repository, a licence agreement will be applied to your data. A licence agreement is a legal arrangement between the creator/depositor of the data and the data repository, stating clear re-use rights to help others understand what they are allowed to do with your data. To make re-use as likely as possible it is recommended you to choose a licence which:
Please see Creative Commons Licences for more information about the types of licences available to you and their different attributions. The Digital Curation Centre (DCC) has also created a guide on "How to License Research Data"
Data should be considered legitimate, citable products of research. Citing research data not only provides a structured way to recognise and reward data creators, but it also makes data easier to find and promotes the validation of research results. As a result, data citations should be accorded the same importance as citations of other research objects, such as publications. Similar to publications, datasets should be cited using persistent identifiers such as digital object identifiers (DOIs), which, unlike standard web links, allow permanent linkage to the digital object.
According to the Digital Curation Centre (DCC), a comprehensive data citation should include:
Note that the way in which these elements would be styled and combined together in the finished citation depends on the citation style being used. Example:
Cool, H. E. M., & Bell, M. (2011). Excavations at St Peter’s Church, Barton-upon-Humber [Data set]. doi:10.5284/1000389
Sometimes there are legitimate reasons for not sharing some or all research data generated by a project. Funders who require data sharing will generally ask that researchers justify this decision in their Data Management Plan (DMP). It is generally possible to choose not to share research data using the following criteria, which have been adapted from the European Commission Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Researchers should also be congnisant of the obligations to commercialise their research or of their obligations to industry collaborators
Please see the sections on Ethical Considerations and Data protection for further information on the limitations of sharing research data, and the importance of informed consent and ethical approval.
Data that needs to be stored for long retention periods, and cannot be put into a data repository (i.e. due to sensitivity) can be stored by the Research IT Service using MS Azure Cloud storage, subject to the relevant Records Retention Schedule. The schedule should define a date for data disposal. This is a 'cold storage' option whereby data are archived and thus not accessible externally, and the metadata are stored in the on-premises system at RCSI. Where necessary, data can be retrieved via the Research IT Service at an additional cost. This option can only support FAIR data through a request-for-copy process, not a direct download process. The Isilon storage at RCSI has the capability to copy files to specific storage in MS Azure, if a request is made to the Research IT Service to do so. This is recommended if data are especially valuable, for 4 sequence data that would be costly to recreate.