Today’s solutions used for data distribution were never meant to serve this purpose and only found application due to lack of alternatives. LOGS for Service Facilities offers a safe and easy way to automatically distribute data from multiple instruments to the scientists. It securely and quickly uploads experimental data and its metadata to a LOGS server that can be set up locally or online.
Service facilities all over the world are searching for a secure and easy data distribution tool that is designed specifically for this purpose. However, currently a tool that meets the different expectations of researchers, PIs and service facility managers is not available. Instead, compromise solutions that have limitations and downsides are applied.
“Automation of the data distribution process is essential, since it not only saves time and money, but also avoids unwanted mistakes caused by human error.”
Four essential features of a good data distribution tool for a service facility
While working closely with several service facility managers, four essential features emerged that a good data distribution tool must comprise of:
- Most importantly, it must meet its main purpose: data distribution. Even though this property seems to be self-evident for such a tool, it represents one of the more challenging requirements. The file sizes of acquired data – especially in NMR spectroscopy and more severely in cryo-electron microscopy – can be of quite large magnitudes, which makes it challenging to forward the data to the researcher. Thus, the service facility tool must not have a data size limitation. In addition, automation of the data distribution process is essential, since it not only saves time and money, but also avoids unwanted mistakes caused by human error. Moreover, the distributed data must not lose its context and therefore should contain as much metadata as possible in order to facilitate its later accessibility and reusability.
- Data security is essential in scientific research. All data has to be secure against anyone from outside the research facility as well as within, which means that the visibility and accessibility of experimental data and metadata has to be limited to authorized people only.
- For scientists, their main focus lies with their daily research. Instead of waiting for results, researchers want to move their projects forwards. Therefore a good data distribution tool needs to provide experimental data reliably, quickly and as soon as possible after the measurement is concluded.
- A last major feature for a good service facility tool is to offer some kind of instrument monitoring, which may indicate the occupancy of instruments, who is measuring, what is being measured and when it will be finished. (This however needs to be consistent with the in (2) mentioned data security.) Furthermore, a statistical tool that summarizes the instruments usage, users occupancy rate as well as a billing tool for experiments and measurement time may represent helpful features for service facilities.
Due to the need to send acquired data to the scientists in some way, various approaches have grown organically.
Commonly used solutions have limitations and downsides
Due to the need to send acquired data to the scientists in some way, various approaches have grown organically. However, most of them are compromises that do not represent a fully satisfying solution. Sending datasets via email, Dropbox or using a USB stick causes issues regarding file sizes, data security and missing context. For instance, sending an NMR experiment via email to the user might still work as long as it is a 1D experiment, but trying to send a 2D NOESY via email can be a challenging endeavor. Multiple institutions have restricted the usage of cloud-based storage systems such as Dropbox and Google Drive to ensure data security. Moreover, some IT departments (especially in industry) do not allow the usage of USB sticks, because of potential malware infection. Choosing one of these data distribution approaches also results in a lot of manual work for either the employees of the service facility or the scientists themselves.
A more common solution is to download experiments via sFTP connections, which seems to be less of a workload for scientists and service facility employees. However, a sFTP connection enables remote access to the spectrometer computer, resulting in a security hazard for the hardware. Obviously, this is something the IT administrators dislike strongly.
Data is missing context and organization
Transfering data with any of the solutions mentioned above leaves the acquired data without any context, such as metadata and experimental parameters, which hampers its later reusability and accessibility. Moreover, due to missing data organization, datasets are downloaded multiple times resulting in redundant data on different computer, servers and storage devices.
LOGS for Service Facilities meets the expectations for a data distribution tool
The solutions mentioned above were never meant to serve as scientific data distribution tools and only found application due to a missing tool. LOGS for Service Facility offers a safe and easy way to automatically distribute data from multiple instruments to the scientists. It securely and quickly uploads experimental data and its metadata to a LOGS server that can be set up locally or online. Scientists and PIs can access, view, compare and download their data in LOGS using a common internet browser. Different filter and search features make the data easily accessible and reusable, while a sophisticated permissions system limits the users’ access only to data they are allowed to see. By checking the LOGS dashboard scientists are able to monitor the status and preview a snapshot of their running experiments. For NMR spectroscopy, LOGS offers an IconNMR integration that promises a fully automated data distribution process. It includes automatic assignment of experiments to their users and existing samples, generation of new samples and optionally linking experiments to their ELN entries.
LOGS for Service Facilities has been developed by scientists for scientists in close collaboration with several service facility managers. Further data formats and methods are continuously added to LOGS.