According to Science Europe, when developing a data management plan, the fifth topic researchers are required to address is "Data sharing and long-term preservation", which broadly encompasses four main questions:
1. How and when will data be shared? Are there possible restrictions to data sharing or embargo reasons?
2. How will data for preservation be selected, and where data will be preserved long-term (e.g. a data repository)?
3. What methods or software tools are needed to access and use data?
4. How will the application of a unique and persistent identifier (e.g. DOI) to each data set be ensured?
All researchers should familiarise themselves with the RCSI Research Data Management Policy.
RCSI recognises research data as a valuable institutional asset, and the role of research data management in underpinning research excellence and integrity. The RCSI Research Data Management Policy applies to all College members engaged in research, including staff and research students, and those who are conducting research on behalf of the College, irrespective of funding. Researchers have the primary responsibility for ensuring research data will be managed in line with funder requirements as well as College policy and other relevant regulations and legislation.
In relation to data sharing and long-term preservation, the Policy states:
There are numerous reasons why you might want to share your research data, including compliance, transparency, collaboration and efficiency. However, how you intend to share your data needs to considered from the start, while you are planning your project. You will need to think critically about how your data can be shared, what might limit or prevent data data sharing (such as informed consent, confidentiality concerns and legal reasons), and whether there are any steps that can be taken to remove such limitations (such as anonymisation of data). It is highly recommended that data be submitted to a discipline specific, community-recognised repository wherever possible, or to a multidisciplinary repository if no suitable discipline specific repository is available, however the options for data sharing also include:
If you are handling and dealing with sensitive data, keep in mind that special attention should be given to collecting, processing, handling and storing data throughout the research process. If you wish to make these data available at the end of the project then you will need to consider this when you are designing your study. In particular, when you are collecting data you will need to ensure you are asking for informed consent to share the data at the end of the project. This might limit your data sharing opportunities, however you can publish a description of your data (metadata) without making the data itself openly accessible, and you can place conditions around access to published data if necessary. Sensitive data that has been properly anonymised can be shared without breaching data protection regulations.
Anonymisation irreversibly destroys any way of identifying the data subject. Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible. OpenAIRE provides researchers with a tool to anonymise data: Amnesia. The guide for which you can find here.
Pseudonymisation replaces any identifying characteristics of data with a pseudonym, a value which does not allow the data subject to be directly identified. The personal data can only be attributed to a specific data subject with the use of additional information, such as decryption key. This key should be kept separately, and be subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable individual. Pseudonymisation only provides limited protection for the identity of data subjects and in many cases as it still allows identification using indirect means.
You must comply with Irish State Law, please see the Data Protection Commission's Guidance on Anonymisation and Pseudonymisation for more information. Both the Australian National Data Service (ANDS) guidelines on Publishing and Sharing Sensitive Data and the OpenAire guide on How to Deal with Sensitive Data provide further information on dealing with and sharing sensitive data.
Sensitive and confidential data can be safeguarded by regulating or restricting access to and use of the data. Access controls should always be proportionate to the kind of data and level of confidentiality involved. When regulating access, consider who would be able to access your data, what they are able to do with it, whether any specific use restrictions are required, and for how long you want the data to be available. The three levels of data access, according to the UK Data Service, are:
Open Data: Data that can be accessed by any user for any reason, including commercial. Data in this category should not contain personal information unless consent is given.
Safeguarded Data: Data that contain no personal information, but the data owner considers there to be a risk of disclosure resulting from linkage to other data
Controlled Data: for data that may be disclosive. Data are generally only available to users through a relevant Data Access Committee, which may mandate training or other protective measures as appropriate.
Additionally, most data repositories will allow you to place a temporary embargo on your data. During the embargo period, the description of the dataset is published, but not the actual data. The data themselves will become available to access after the embargo period ends.
Sometimes there are legitimate reasons for not sharing some or all research data generated by a project. Funders who require data sharing will generally ask that researchers justify this decision in their Data Management Plan (DMP). It is generally possible to choose not to share research data using the following criteria, which have been adapted from the European Commission Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Some reasons why it might not be possible to share data include:
Data are commercially sensitive
Data are confidential (due to third party obligation
Sharing data would break data protection regulations
Sharing would mean that the project's main aim might not be achieved
Data are generated under an industry funded or co-funded project
Sharing of the data may impact on future plans to protect intellectual property
Please see the sections on Ethical Considerations and Data protection for further information on the limitations of sharing research data, and the importance of informed consent and ethical approval.
Data preservation at RCSI: Data that needs to be stored for long retention periods, and cannot be put into a data repository (i.e. due to sensitivity) can be stored by the Research IT Service using MS Azure Cloud storage, subject to the relevant Records Retention Schedule. The schedule should define a date for data disposal. This is a 'cold storage' option whereby data are archived and thus not accessible externally, and the metadata are stored in the on-premises system at RCSI. Where necessary, data can be retrieved via the Research IT Service at an additional cost. This option can only support FAIR data through a request-for-copy process, not a direct download process. The Isilon storage at RCSI has the capability to copy files to specific storage in MS Azure, if a request is made to the Research IT Service to do so. This is recommended if data are especially valuable, for sequence data that would be costly to recreate.