In the EU,"‘research data’ means documents in a digital form, other than scientific publications, which are collected or produced in the course of scientific research activities and are used as evidence in the research process, or are commonly accepted in the research community as necessary to validate research findings and results" Article 2 (9) directive 2019/1024. Directive (EU) 2019/1024 On Open Data and the Re-use of Public Sector Information.
According to Science Europe, when developing a data management plan, the first topic researchers are required to address is "Data description and collection or re-use of existing data", which broadly encompasses two main questions:
What data will be collected or produced?
How will new data be collected or produced and/or how will existing data be re-used?
Although research data can take many forms, and are often discipline-specific, at a basic level research data can be described as "any information that has been collected, observed, generated or created to validate original research findings" (University of Leeds). Common examples of research data include measurements, experimental results, fieldwork observations, interview recordings and images. Although usually digital, research data also includes non-digital formats such as laboratory notebooks.
When choosing file formats for research data it's important to consider whether the format is open and/or ubiquitous. File formats that are open or non-proprietary will tend to remain accessible, even if the software that created them is no longer available. Therefore, the use of closed proprietary formats will not normally be appropriate. However, formats which are ubiquitous or have become the default standard within a discipline, whether proprietary or not, are also more likely to be maintained into the future. It may be useful to store your data using one format for data collection and analysis and also in a more open or accessible format for sharing or archiving once your project is complete. Many data archives and repositories will already have recommended file formats based on best practice within the disciplines they support.
When choosing a file format you should consider the following:
If you are unsure which format you should use, the UK Data Service provides the following guidelines:
Contemporary research is often collaborative and reusing existing research data has become common practice in many disciplines. Although convenient and cost effective, when reusing existing research data it essential that you take the time to familiarise yourself with the data and check the accompanying documentation for collection procedures, data cleaning procedures, usage licenses and other technical information to make sure the data are suitable for your research. Reusing existing research data does also not make them exempt from GDPR and any other relevant regulatory and ethical policies that researchers must comply with.
The Library of Congress identified preferred and acceptable file formats for textual works and musical compositions, still image works, audio works, moving image works, software and electronic gaming and learning, datasets/databases and websites.
Guidance on file formats recommended and accepted by the UK Data Service for data sharing, reuse and preservation.
Preferred formats identified by the UCD Digital Library and Repository which facilitate processing, storage, and dissemination of data, assuring both useability and longer-term durability of the data.