Skip to Main Content

Research Data Management

DMPonline at RCSI

Introducing Research Data Management

Research data management (RDM) is defined as the active and ongoing management of data “from its entry to the research cycle through to the dissemination and archiving of valuable results” (Whyte & Tedds, 2011)​. 

RDM is an overarching term encompassing​ the organisation​ of research data, their storage​, documentation​, and curation, culminating in the long term preservation this data after the research has completed.

The best approach to RDM is to start with a data management plan, and you can read advice on writing a data management plan further on in this guide. 

 


Data are a valuable resource that often require a great deal of time, effort and money to create. Like journal articles, research data are a scholarly output, however data are much more fragile and vulnerable to being lost. There are a huge number of very good reasons why research data should be managed: 

  • Research Quality: Good management helps to prevent errors and increases the quality of your analyses by outlining the steps and quality control measures put in place.
  • Data Security: Good research data management helps you to establish appropriate data storage, back-up and management protocols, reducing the risk of data loss through accidents and neglect
  • Research Integrity and Validation of Results: Accurate and complete research data are an essential part of the evidence necessary for evaluating and validating research results and for reconstructing the events and processes used to generate them.
  • Research Impact: Research data, if correctly formatted, described and attributed (such as persistent identifiers), will have significant higher visibility and ongoing value, and can continue to have impact long after the completion of a research project.
  • Scientific Inquiry: Good research data management reinforces open scientific inquiry and can lead to new and unanticipated discoveries. Sharing well-managed research data and enabling others to use it will also help to prevent duplication of effort.
  • Funder Requirements: An increasing number of funding bodies (e.g. Health Research Board, Irish Research Council) request or require that their funding recipients create and follow plans for managing data, storing or preserving it in the long term, and sharing some, or all data products with the public. A comprehensive data management plan will help ensure that all of your funder requirements are met. 

 

Where do you start with Research Data Management? It can be helpful to think of research data management in terms of a research data lifecycle and the data-related activities that take place at stages during this lifecycle. The diagram below from the University of Reading illustrates the research data lifecycle in seven stages.

 

Plan: Identify the data that will be collected or used to answer your research question. This is the stage at which the data management plan is created. Many funders ask for a data management plan to be submitted as part of a research application or within the first six months of starting a new project. 

 

Collect: Data are collected, via experiments, observations, surveys, secondary materials etc. depending on your methodology. You should be actively documenting your data collection, including information on instruments and methods - anything that's necessary to interpret and use the data.

 

Process: Once data have been collected they are processed in order to be usable. This might involve cleaning data to eliminate noise, combining data from multiple sources, transforming data from one state to another (e.g. by format conversion), and using procedures to validate or quality-control data. Any data processing will need to be documented, such that the end result can be replicated from the raw data.

 

Analyse: The raw materials of research are interrogated to produce the insights that constitute the research findings, which will be written up and published in research outputs. Instruments and methods used for analysis should be documented; code written for purposes of data analysis and visualisation may need to be preserved and made available in support of research results.

 

Preserve: Towards the completion of your research you will select the data that is needed to substantiate your research findings, or those with long-term value, and you will preserve these data for the long term. For data to remain accessible and safe in the long term, it must be prepared for preservation and deposited in a suitable location such as a data repository. Preservation activities may involve quality assurance of data, file format conversion, creation of metadata records with assignment of Digital Object Identifiers (DOIs) to datasets, licensing datasets for re-use, and putting in place any required access controls. If the data is confidential or non-digital, it may be held locally, in which case they should be managed by an accountable person or group, who can ensure they are stored and preserved properly.

 

Share: Publications based on data should include a data citation or a statement indicating where and on what terms the data can be accessed. A data repository will enable discovery of the data in its care by exposing the metadata online, and will provide access to the data when this is permitted. Data may be made publicly available, or restrictions on access may be imposed where data are of a sensitive or confidential nature. Data held locally or in non-public locations should be managed in such a way that others can discover and apply for access to the data.

 

Re-use: Data that are available for discovery and access may be re-used by other researchers, either to substantiate the findings of the original research, or to generate new insights through further interrogation and analysis. At this stage the data may become raw materials collected within a new cycle of research. Research data may also have other valuable uses, e.g. in policy-making, development of commercial products and services, and teaching.

(Content adapted from The research data lifecycle by the University of Reading) 

Research Data Management Policy

At RCSI, our Research Data Management Policy provides a framework for the management of research data to ensure that research data is stored, retained, made available for use and reuse, and disposed of according to best international practices for data management, as well as in compliance with legal, statutory, ethical, contractual and intellectual property obligations, and the requirements of funding bodies and publishers.

Key points of our Research Data Management Policy

  • The Research Data Management policy applies to all College members engaged in research, irrespective of funding status or career stage.
  • Researchers have the primary responsibility for ensuring research data will be managed in line with funder requirements as well as College policy and other relevant regulations and legislation.
  • Research data must be as compatible as possible with the FAIR data principles, and as open as possible and restricted as necessary.
  • Research data must be preserved for its life-cycle with the appropriate high-quality metadata.
  • A Data Management Plan must be prepared at the start of the project and updated annually.
  • Research data that underpins published results or is considered to have long-term value should be retained (subject to consent) and the default period for research data retention is 10 years from date of last requested access.

 Read the RCSI Research Data Management Policy in full.

Research Data Management and FAIR data

The FAIR Data Principles are a set of guidelines for best practice in managing the outputs of research, with the ultimate goal of optimising the reuse of research data. The FAIR Data Principles have rapidly come to define best practice in research data management.

 

Visit our FAIR data library guide for practical steps you can take to make your research data FAIR. 

Rethinking Research Data

A short video about sharing Research Data: Dr Kristin Briney, a Data Services Librarian at the University of Wisconsin-Milwaukee, describes the current research data landscape, how it can be improved to increase scientific reproducibility and how shared data can be reused in new ways to generate new innovations and technologies.