In order to ensure data and its metadata preservation, funding agencies have made data management plans (DMPs) requisite for the approval of funding. Data management plans need to be tailored to the scientific fields they are to be employed in, and their planning consequently grows quickly into a complex and challenging task. However, some aspects of DMPs are the same for all fields, and software tools facilitating compliant data handling are gaining more and more importance.
Despite the strong growth in (primary) data over the last 25 years, little to no attention was paid to compliant data management. This has become a crucial issue, since labs have acquired and assembled masses of data throughout the years, which today are hardly accessible nor reusable.
Questions like “How do I find data that’s important for proposal xy and that was measured half a year ago?” and “PostDoc Sarah Smith has left the lab a couple of months ago. Now, I require data, results and pictures from her project for my next talk at the zy conference. I know it’s saved somewhere. Where can I find it? How did she name the files? Are there any more data/results related to it? How am I going to hand over her research and results to her successor?” have become a frequent issue in research business.
In order to receive funding, applicants need to demonstrate compliant data management
As a result, more and more funding agencies make data management plans (DMPs) mandatory for funding applications. The purpose of a DMP is to outline the handling of research data during and after a project in order to ensure data and its metadata preservation.
DMPs require a lot of attention to keep them at a sufficiently high standard
DMPs are especially helpful when prepared before data is acquired, ensuring a consistent data organization, annotation and format. However, coming up with a sophisticated DMP that lets scientists focus on their research and does not require too much effort is not that easy. It requires a lot of thought, time and energy to getting it started and even more to maintain and keep it at a high standard. Therefore, most PIs feel overwhelmed by the task to come up with a maintainable DMP.
“The guidelines provided by funding agencies for DMPs are quite vague and ignore crucial details which differ from one scientific discipline to the other.”
By trying to cover various research disciplines, from physics to social studies, from theology to law, the guidelines provided by funding agencies and universities are quite vague and ignore crucial details which differ from one discipline to the other. So, what should a DMP include and what details are important specifically for spectroscopic data to facilitate the data’s accessibility?
DMPs describe the project, data collection and handling as well as ethics and legal compliances
In general, a DMP can be divided into three major parts: (I) A general, descriptive introductory part about the project and its data, (II) a detailed data organization system outlining the data collection and handling during and after the project. Finally, (III) administratives, ethics and legal compliances concerning data sharing, safeguarding and responsibilities. Checklists for DMPs may be found here and here.
The general introduction is straightforward and requires a basic description:
- of the project including
- the nature of the research project and
- the research questions that the project will address
- the type of data that will be acquired including
- existing types of data that can be reused in the project,
- the format and size of the data.
The essential data organization section includes details on data collection, documentation, management, preservation as well as the handling of metadata. This refers to the way the data will be:
- acquired including used standards or methodologies
- evaluated and saved including
- description of a storage structure that includes a structured naming and versioning system of samples and datasets
- a plan for long-term storage
- the data’s retrievability and
- metadata documentation:
- required metadata to read and interpret data in the future
- definition of metadata standards
- a quality assurance processes.
The final part describing safeguarding, data access and sharing has to list:
- precautionary steps to safeguard and backup data including
- additional services for data storage
- backup plan with assigned responsibilities
- in case of sharing data:
- who owns the data,
- defined restrictions on the reuse of third-party data and sharing e.g. to publish or seek patents
- what data will be shared and in what way will it be shared
Finally, the plan must include a definition of the project’s data management responsibilities.
Software tools facilitate data care
While the descriptive first section (I), the data acquisition (II.1) and management responsibilities are straightforward and self-explanatory, the main data organization is more tricky to handle. Fortunately, there are software tools to facilitate the realization of data monitoring, saving, safeguarding and sharing (II.2-III.2), on the downside none of these tools offers a full solution.
An ideal DMP tool automates data input steps
An ideal DMP tool assists scientists by making their (primary) data findable and accessible. This is achieved by creating a repository of primary data, which is most beneficial for scientists and PIs, if the data is deposited automatically as soon as it is generated. Ideally, as much metadata as possible is correlated with the uploaded primary data automatically, by parsing acquisition files and digital notes for relevant parameters. The automation circumvents gaps and missing data as well as false data input caused by human error.
The ideal tool also allows scientists to share data, combined with its metadata, in customized ways: e.g. internally in the research group, or to different sites, but also with collaborators, journals and databases. Such a tool would intrinsically handle the above mentioned points concerning data collection, retrievability, metadata documentation, long-term storage (II.2) and data sharing (III.2). Data backup (III.1) could easily be achieved by mirroring the content on a second server. While not strictly necessary for GLP (good laboratory practice), a tool featuring an interactive viewer would allow scientists to peruse measurements easily.
ELNs are designed to manage lab procedures, samples and syntheses
Compared to the ideal tool, today’s available supporting software tools, ELNs (electronic lab notebooks) and repositories have limitations. ELNs were originally designed to manage samples, compositions, syntheses, reactions, storage and document the lab procedures and serve best these use cases. Even though some ELNs offer a manual experiment upload, they neither offer autonomous primary data input options and sharing features nor are they designed to handle spectroscopic data.
Data repositories create a database infrastructure of primary data
Despite the fact that most data repositories do not offer automated data storage solutions, they seem to be the optimal tool for supporting researcher with their spectroscopic data, since they serve the purpose to create a database infrastructure that collects, manages and saves (primary) data for data analysis and sharing.
Thus, to cover the aspects of a good DMP for spectroscopic data, an automatic data repository represents the best solution, possibly complemented with an ELN for sample documentation.