In recent years, governmental organizations and funding bodies have held the scientific community to increasingly higher standards concerning data retention. While it was enough to store primary data for 10 years after its publication a few years ago, today researchers are expected to ensure data to be findable, reusable, and accessible (FAIR), and in the near future to deposit it into Open Access databases. For this to work, an increasingly developed data administration will be necessary at the internal lab level, together with an accepted directive within the group, to save and store data using defined procedures and protocols. Even though such procedures have evolved over the last few years, there is still some way to go. To put it bluntly: dumping the experimental data in a folder on some lab computer is not sufficient anymore – and honestly, never has been.
A major issue in dealing with data are the lack of adequate data management (recording) tools. While handwritten notes will probably always play a role, scientists using nothing else will be unable to meet the requirements for today’s digitally produced data. As a consequence, researchers have to start modernizing and updating their scientific data handling.
Fortunately, a growing number of software platforms are available that facilitate data intake, storage, organization, communication and even let us automate processes. These tools usually focus on specific application fields.
Currently, platforms are categorized either as laboratory information management systems (LIMSs) or electronic laboratory notebooks (ELNs). In order to meet today’s data management standards, a third complementary class of tools is emerging: the Scientific Data Managemtent System (SDMS), not to be confused with the all encompassing Open Access data repositories discussed in academic circles and meant for a more broader audience.
LIMSs were originally designed to improve lab efficiency by managing workflows relating to samples and associated data. Nowadays, they find application in workflow management, record keeping and inventory management and thereby help standardize standard operations, tests and procedures, while providing controls of the process.
ELNs are the digital successors of paper-based laboratory notebooks, and are sufficiently secure and reliable to serve as reference in legal matters. Ideally, ELNs cover all events accompanying a samples lifecycle: planning and performing the sample’s preparations and investigations, including details on theoretical and practically performed reactions, anything that happens in the lab, and an overview of all run experiments. Most ELNs allow a manual upload of the representation of experimental data in some form, (e.g. pdf files), but fall short of handling experimental data and metadata storage adequately.
Experience has shown that nothing less than a dedicated SDMS will suffice to adequately keep scientific data findable and accessible, with enough metadata to ensure its reusability. A flexible permissions system is required to regulate user access, together with search and filter functionalities for sifting through datasets only of immediate interest.
Such a SDMS has a chance of fulfilling its purpose successfully only if data is gathered comprehensively: cherry picking measurements for safekeeping increases the uploading effort unduly, and will in the long term lead to missing data. Such a reliable and comprehensive data upload can be achieved only through an automated process, in which data is gathered from instruments straight after the measurements. It’s worth mentioning that calibration measurements and other dross should be filtered out during searches, yet available when potentially needed.
By implementing a Scientific Data Management System and using it together with an ELN, virtually all aspects of research data creation may be documented in a FAIR manner. The SDMS and ELN ideally work in conjunction via API or web based communication protocols, in order to allow a seamless integration of automatically gathered measuring data and manually entered protocols. Furthermore, when the time comes to upload data into a public Open Access data repository, the internal SDMS would enable researchers to upload the data with the click of a button.