A data citation is an entry for a dataset within the reference list of an article, book, conference proceeding, or other document. Data citations are captured by standard citation counting methods if they are included in the reference list. However it is unfortunately still common practice for researchers to not cite data correctly in their reference list, or not to include sufficient information on the source of their research data.
It's important to cite data in your publications, in just the same way you would articles, books, images and websites, as a dataset is a source of evidence to support your argument. The UK Data Service have provided a useful video summarising why it is important to cite data correctly.
Benefits of data citation
Transparency: Citing data is a way of clearly showing exactly which version of which dataset has underpinned or influenced research, as well as crediting those who have made the work possible by collecting the data.
Reproducibility: It helps future researchers to find out which data the researcher has used and enable the research to be reproduced to assess its integrity. Louise Corti, Director of Collections Development and Data Publishing for the UK Data Service, has written a great blog about research reproducibility in qualitative research: Show Me the Data.
Helping track the use of the data: Researchers who [share data] want to know that the data is being used, just like any other researchers want to know that their book or article has been used to support others’ research. In addition, bodies that fund the collection of this data want to know that their funding has produced value. It can also help researchers in gaining further funding for future data collection and analysis. Susan Noble wrote a great post looking at finding out what people have done with data we provide and its impact.
Measuring impact: Researchers want their books, articles and data to be make a difference to others, whether this is on future research, influencing policy or positively changing the lives of individuals, communities or society. Citing data, like citing any other research helps [repositories] in measuring and reporting on this impact.
Source: UK Data Service Spotlight on #CiteTheData: Make the data count – Data Impact blog (ukdataservice.ac.uk)
According to the ICPSR, the elements of a data citation are:
These are the minimum elements required for dataset identification and retrieval. Fewer or additional elements may be requested by author guidelines or style manuals. Be sure to include as many elements as needed to precisely identify the dataset you have used.
Type | Example |
---|---|
Published dataset citation with an archive number: | TILDA. (2019). The Irish Longitudinal study on Ageing (TILDA) Wave 4, 2016. [dataset]. Version 4.0. Irish Social Science Data Archive. SN:0053-05. www.ucd.ie/issda/data/tilda/wave3 |
Published dataset citation with a DOI number: | Smith, J., and Jones, P. (2023). Environmental risk factors for autism [dataset]. Dryad Digital Repository [distributor]. doi: 10.1234/abcd123 |
Unpublished dataset: | Smith, J., and Jones, P. (2023). Environmental risk factors for autism [unpublished raw dataset]. Royal College of Surgeons in Ireland. |
published dataset citation from an organisation or research group: | Health Service Executive. (2019). General Referrals by Hospital, Department and Year 2019. [dataset]. HSE Open Data [distributor]. |
published dataset from individual authors: | Smith, J., and Jones, P. (2023). Environmental risk factors for autism [dataset]. Dryad Digital Repository [distributor]. doi: 10.1234/abcd123 |
For a deep dive into Data Citation see: Ball, A. & Duke, M. (2015). ‘How to Cite Datasets and Link to Publications’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: /resources/how-guides