Skip to Main Content

Research Data Management

Introduction to data repositories

 

Why publish in a data repository? Some researchers chose to publish their underpinning data as supplementary files alongside a journal article, however this often has limitations regarding discoverability and long-term preservation of that data. Increasingly, scientific publishers recommend that data underpinning a publication is deposited in a suitable data repository and linked to publications via a data citation. You can read more about composing your data citation in this part of the guide

 


What is a data repository? According to the NLM:

A repository is a tool to share, preserve, and discover research outputs, including but not limited to data or datasets. Generally speaking, researchers submit and describe their own data which is then ingested into the repository for storage. Other researchers can then download, or request to download, the data directly from the repository.​ (source: National Library of Medicine) 


Repositories support FAIR data. Data data repositories offer enhanced features for data preservation, accessibility, and discoverability, including: 

  • Persistent Identifiers (PIDs): Repositories assign unique and persistent identifiers (e.g., DOIs) to datasets, ensuring they can be reliably cited and located over time.
  • Metadata Assistance: Repositories guide users in providing comprehensive metadata describing the dataset, enhancing its discoverability and reusability. 
  • Licensing: Repositories facilitate the assignment of appropriate licenses to datasets, clarifying usage rights and promoting responsible data sharing.
  • Long-Term Access: Repositories ensure the long-term preservation and accessibility of datasets, mitigating the risk of data loss or obsolescence.
  • Search Tools: Repositories provide search functionalities that allow researchers to easily discover relevant datasets.
  • Global Reach: Repositories make data accessible to a global audience, fostering collaboration and accelerating scientific discovery.
  • If the data is required controlled access, some repositories can manage access requests on behalf of the owner of the data.

In certain cases publishers or funders may specify which data repository you must use to deposit your data. However in most cases you will have to identify a suitable home for your data. As you review potential repositories, ask the following questions to assess their suitability:

  • Is it reputable? For example, is it listed in Re3data thereby meeting their conditions of inclusion?
  • Is it appropriate to my discipline?
  • Does it accept the type of data I want to deposit?
  • Is there a size limit on how much data I can deposit?
  • Is there a charge to deposit – even a one off fee?
  • Will it provide a persistent identifier for my data such as a DOI number?
  • Does it provide guidance to new users on how the data should be cited?
  • Does it provide access control for my research data?
  • Does it ensure the data will be preserved long term (for the foreseeable) or is there a time limit on the repository?
  • Does it provide expert help e.g. metadata provision, curation?

This page provides an introduction to several repositories that are commonly used by researchers at RCSI.

Zenodo

Zenodo logoRepository URL: http://zenodo.org/

Zenodo is a general-purpose repository that accepts a wide variety of scholarly content across all scientific disciplines. Zenodo accepts multiple forms of research outputs, including datasets, presentations, images, publications and preprints, and software (via an integration with GitHub). It is a well-known repository that connects with many other RDM tools including ORCID and GitHub. Zenodo is a good solution if you do not have a suitable thematic repository for your data, or if you want to keep different types of content together in one space. 

Zenodo is open to all research outputs regardless of funding source. However, when you upload your research outputs to Zenodo, you can link them to grants from more than 11 funders, such as European Commission, National Science Foundation and Wellcome Trust. Zenodo is particularly suitable for research that is funded by European Commission, as it is integrated into their reporting lines via OpenAIRE.


Background: Zenodo was developed by the European Organization for Nuclear Research (CERN) and is managed by CERN and OpenAire. Files are stored in the CERN Data Center in Switzerland, which provides long-term preservation.

Who is Zenodo for? Zenodo is suitable for all types of research output from all all scientific disciplines.

Cost to host data on Zenodo: This service is free.

 

How do I add my research outputs to Zenodo? 

What file formats are accepted on Zenodo? 

  • Zenodo is not domain-specific and therefore accepts research outputs from all areas of Medicine and Health Sciences, alongside all other scientific areas.
  • Zenodo accepts any file format, even 'preservation unfriendly' formats.
  • Zenodo accepts both positive and negative results.

Are there limits on how much I can upload to Zenodo?

  • The total files size limit per record is 50GB (max 100 files). One-time 100GB quota can be requested and granted on a case-by-case basis. Contact Zenodo to request this.

What type of licensing options are available for my content in Zenodo?

  • Users must specify a license for all publicly available content. Files may be deposited under closed, open, or embargoed access. If using the embargoed access option you must provide an end date for the embargo.
  • Files deposited under closed access are protected against unauthorized access at all levels. Licenses for closed access content can be specified in the "Description" field of the metadata. 
  • Files deposited under an embargo status are restricted until the end of the embargo period; at which time, the content will become publicly available automatically. 

What are the policies on file preservation and access at Zenodo?

Is there support available from a human?    

  • Yes. Contact https://zenodo.org/support.

Where is the content physically held?

  • Files are stored in the CERN Data Center in Switzerland, which provides long-term preservation.
  • CERN has considerable knowledge and experience in operating large scale digital repositories.
  • Files and metadata are kept in multiple online and offline copies.

Can I restrict access to sensitive content on Zenodo?

  • Yes, users can deposit files under a 'restricted access' condition. This allows the user to share access to a file on Zenodo if the specified conditions are met. 
  • Restricted access files are not made publicly available and sharing will be made possible only by the approval of depositor of the original file.
  • Research materials can set to share with reviewers only.

 

What metadata standards does Zenodo support?

  • All metadata is stored internally in JSON-format according to a defined JSON schema. Metadata is exported in several standard formats such as MARCXML, Dublin Core, and DataCite Metadata Schema (according to the OpenAIRE Guidelines).
  • All metadata is openly available under a CC0 licence.
  • All metadata is exported via OAI-PMH and can be harvested.

Does Zenodo provide persistent identifiers for content?

  • Every upload gets a Digital Object Identifier (DOI) to make them easily and uniquely citeable and trackable.

Read more about how Zenodo supports the FAIR data principles here: https://about.zenodo.org/principles/

Is there a cost to access content that has been shared on Zenodo? 

  • It is free to access content on Zenodo.
  • All metadata is openly available under a CC0 licence, and all open content is openly accessible through open APIs.

Is the content on Zenodo harvestable?

  • All open content is harvestable via OAI-PMH by third parties.

Are there any drawbacks when using Zenodo to share and preserve data?

  • As with all generalist repositories, Zenodo cannot offer specialized support for specific data types.
  • Generalist repositories lack the deep domain expertise and curation services that are usually found in specialized repositories. 
  • Given the volume and variety of content provided by Zenodo, end users may have difficulty in discovering datasets of interest while browsing.