According to Science Europe, when developing a data management plan, the second topic researchers are required to address is "Documentation and data quality", which broadly encompasses two main questions:
What metadata and documentation (e.g. data collection methodology and data organisation) will accompany data?
What data quality control measures will be used?
Research data files and folders should be labelled and organised in a systematic and consistent way so that they are easy to find, both for you and others in your research team. As research becomes more collaborative, it is essential to keep all file names consistent within a research project and to track of changes and edits to files via the file name. All researchers involved in a project should follow the same file naming conventions and file names should be independent of the location of the file on a computer. It’s generally recommended for file and folder names to be concise, but informative enough to detail the contents of the file. Common elements that should be considered when naming files include:
In addition, it is also recommended to use lowercase letters and avoid spaces when naming files.
Similar to consistent file naming conventions, a meaningful folder structure is a key element of project and data management and will make it much easier for you to locate and organise relevant documents. This is particularly important if you are working as part of a larger research group where many people will be accessing the files over the course of the project.
The folder structure strategy you implement will depend on the plan and organisation of the project, in addition to your own personal preferences. All material relevant to the data should be entered into the data folders, including detailed information on the data collection and data processing procedures. It is recommended to limit the level of folders to three or four deep and to limit the number of items in each list to less than ten.
Managing different versions of your data can be tricky, but version control is a key step in good research data management, and project management overall. You should always keep original versions of data files, or keep documentation that allows the reconstruction of original files. All changes to the original versions should be documented, and this can be achieved in several ways:
For more information please see the CESSDA Data Management Expert Guide: File naming and folder structure.
For more information please see the CESSDA Data Management Expert Guide: Documentation and metadata.
The project-level documentation explains the aims of the study, what the research questions/hypotheses are, what methodologies were being used, what instruments and measures were being used, etc. The questions that your project-level documentation should answer are:
Data-level or object-level documentation provides information at the level of individual objects such as pictures or interview transcripts or variables in a database. You can embed data-level information in data files. For example, in interviews, it is best to write down the contextual and descriptive information about each interview at the beginning of each file. And for quantitative data variable and value names can be embedded within the data file itself.
For quantitative data document the following information is needed:
For qualitative data document the following information is needed:
For more information please see the CESSDA Data Management Expert Guide: Documentation and metadata.
According to the UK Data Service: "metadata can describe the content, context and provenance of datasets in a standardised and structured manner, typically describing the purpose, origin, temporal characteristics, geographic location, authorship, access conditions and terms of use of a dataset”. Rich metadata enhance the findability, interoperability and reusability of your data. To comply with the FAIR Principles metadata should be accessible wherever possible, even if the data themselves are not accessible. Metadata are intended to be machine-readable, but in many cases you do not need to generate this yourself. When you submit data to a trusted Data Repository or Archive, the archive will often generate machine-readable metadata for you, or provide you with a template or required standard you must use. If not you should follow relevant disciplinary standards and controlled vocabularies.
FAIRsharing is a curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies.
A detailed list of discipline-specific metadata standards has been compiled by the Digital Curation Centre (DCC).
The RDA Metadata Standards Directory contains widely used metadata standards in the Arts & Humanities, Engineering, Life Sciences, Physical Sciences & Mathematics, Social & Behavioral Sciences and General Research Data.
Dublin Core is a metadata standard comprised of 15 “core” metadata elements (outlined below). It is one of the simplest and most widely used metadata schema. Built into the Dublin Core standard are definitions of each metadata element that state what kinds of information should be recorded where and how. Associated with many of the data elements are suggested controlled vocabularies. You can create machine-readable metadata using the the Dublin Core Metadata Generator. This useful tool can create both simple and advanced metadata whihc are converted into a machine-readable file in *.xml.
Dublin Core Element | Definition | Example |
---|---|---|
Tile | The name given to the resource. Typically, a Title will be a name by which the resource is formally known. | A Nurse's Guide to Cancer Research |
Creator | An entity primarily responsible for making the content of the resource. Examples of a Creator include a person, an organization, or a service. | Murphy, Aine |
Date |
A date associated with an event in the life cycle of the resource. Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [Date and Time Formats, W3C Note] and follows the YYYY-MM-DD format. |
2020-12-01 |
Description | An account of the content of the resource. Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. |
Illustrated guide to cancer research and funding, with particular reference to the role of nurses |
Rights | Information about rights held in and over the resource. Typically a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. | Access limited to members |
Type | The nature or genre of the content of the resource. Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMIType vocabulary ). | Text |
Language | A language of the intellectual content of the resource. Recommended best practice for the values of the Language element is defined by RFC 3066 which, in conjunction with ISO 639, defines two- and three-letter primary language tags with optional subtags. |
en-GB |
Contributor |
An entity responsible for making contributions to the content of the resource. Examples of a Contributor include a person, an organization or a service. |
Murphy, Aine |
Relation | A reference to a related resource. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system. |
2019 book "Oncology Nurse Navigation" by Lillie D. Shockney |
Source | A Reference to a resource from which the present resource is derived. The present resource may be derived from the Source resource in whole or part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system. | Interviews with Irish nurses between 2005-2015 in the Irish Social Science Data Archive (ISSDA) |
Coverage |
The extent or scope of the content of the resource. Coverage will typically include spatial location (a place name or geographic co-ordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names). |
Dublin, Ireland. 2005-2015 |
Subject | The topic of the content of the resource. Typically, a Subject will be expressed as keywords or key phrases or classification codes that describe the topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. | Cancer research and funding |
Identifier | An unambiguous reference to the resource within a given context. Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Examples of formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN). | ISBN:0385424728 |
Format | The physical or digital manifestation of the resource. Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. | Book, 1989 pages |
Publisher | The entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. | RCSI |