Research data is saved on different types of storage media, but how reliable are these systems that we use for long-term storage? What other options are there?
“We assume that digital files will last forever because they don’t degrade, but the media we store them on can and does.”
How reliable is the standard storage media that we rely on?
When it comes to data storage and backups, most of us rely on standard technologies that we are familiar with, e.g. flash and hard drives as well as CDs. However, these storage media are subject to deterioration processes. These processes may be influenced by the media’s usage frequency, storage conditions such as high temperatures, humidity and light exposure, but also by human interactions, e.g. disc labeling. For example, if you save PhD theses of former researchers on CDs, after ten years you probably won’t be able to read them anymore. The lifetime of a CD is two to five years and its deterioration process may be accelerated by simply labeling it.
“The best way to lose your data on a CD is to label it. It starts the deterioration process immediately.”
After three years the annual failure rate of hard drives explodes
Hard drives may last up to seven years. A cloud system provider monitored their 25,000 constantly running hard drives and observed that 90% of these survived the first three years(Source) . Failure during this time period was ascribed to manufacturing defects and random events. After reaching the third year, the annual failure rate exploded to 12%, which was caused by wear outs. “Various components can only rotate, gyrate, and actuate so many times.” (Source)
System obsolescence is a major challenge in long-term data storage
A storage media that lasts comparatively longer are floppy discs. But these have a capacity of 1.44 MB(Source) only and moreover are not supported anymore. This leads us to another major challenge regarding long-term data availability: system obsolescence.
After a couple of years the system required to read the data may be obsolete. “Some interfaces just don’t exist anymore.” said Thomas Youkel, chief of the enterprise systems group at the Library of Congress(Source) . Then, despite having saved and backed up the data responsibly, it might not be readable nor accessible due to missing tools. The best examples for this problem are floppy discs and VHS tapes. Not too long ago – at least I can still remember that time quite well – floppy discs used to be the go-to solutions for data storage. But if you ask anyone born later than 2000, they probably would not even know what you are talking about. Other examples of technology that simply disappeared are punch cards, magnetic drum memory, punched tape, laserdiscs, 8-tracks, Betamax, Zip disks, cassette tapes and MiniDiscs<(Source) .
How do you keep your scientific data accessible?
How are you saving scientific data of current and former PhD students, postdocs, scientists and coworkers? Are you utilizing CDs, hard or flash drives to save all this data? What time span does your storage system cover? Can you still use and access data that was saved five or ten years ago?
What is the least you want to do to secure your data?
There are two options to secure your data and you probably want to take advantage of both. By copying your data to e.g. an external storage device you will have executed a standard backup of your data. But, in order to avoid consequences of unrecoverable sector read errors and failures of whole physical drives you should ensure data redundancy via a RAID system.
What storage system should you use?
One option to keep your data safe and accessible is to create a backup manually on a hard-drive. However, this approach may take up quite some time, since you need to perform a backup frequently and you will need to keep your storage system up-to-date. Otherwise it is not necessarily reliable.
A second and easier option to ensure your data’s safety and accessibility is to use a professional cloud platform. “They’ll maintain multiple copies of your files on technology that’s frequently updated.” Also, the data may be uploaded automatically. Thus, you do not have to worry about performing a backup or uploading data. The downsides are recurring payments and the fact that you entrust your data to a third party. The resulting problem of data security may be circumvented by encryption. This requires some know-how, but once it’s set up, it works automatically.