Quite a bit, according to the experts. For one thing, what we think is permanent isn’t. Digital storage systems can become unreadable in as little as three to five years. Librarians and archivists race to copy things over to newer formats. But entropy is always there, waiting in the wings. “Our professions and our people often try to extend the normal life span as far as possible through a variety of techniques, but it’s still holding back the tide,” says Joseph Janes, an associate professor at the University of Washington Information School.
To complicate matters, archivists are now grappling with an unprecedented deluge of information. In the past, materials were scarce and storage space limited. “Now we have the opposite problem,” Janes says. “Everything is being recorded all the time.”
In principle, that could right a historic wrong. For centuries, countless people didn’t have the right culture, gender, or socioeconomic class for their knowledge or work to be discovered, valued, or preserved. But the massive scale of the digital world now presents a unique challenge. According to an estimate last year from the market research firm IDC, the amount of data that companies, governments, and individuals create in the next few years will be twice the total of all the digital data generated previously since the start of the computing age.
Entire schools within some universities are laboring to find better approaches to saving the data under their umbrella. The Data and Service Center for Humanities at the University of Basel, for example, has been developing a software platform called Knora to not just archive the many types of data from humanities work but ensure that people in the future can read and use them. And yet the process is fraught.
“We can’t save everything … but that’s no reason to not do what we can.”
“You make educated guesses and hope for the best, but there are data sets that are lost because nobody knew they’d be useful,” says Andrea Ogier, assistant dean and director of data services at the University Libraries of Virginia Tech.
There are never enough people or money to do all the necessary work—and formats are changing and multiplying all the time. “How do we best allocate resources to preserve things? Because budgets are only so large,” Janes says. “In some cases, that means stuff gets saved or stored but just sits there, uncatalogued and unprocessed, and thus next to impossible to find or access.” In some cases, archivists ultimately turn away new collections.
The formats used to store data are themselves impermanent. NASA socked away 170 or so tapes of data on lunar dust, collected during the Apollo era. When researchers set out to use the tapes in the mid-2000s, they couldn’t find anyone with the 1960s-era IBM 729 Mark 5 machine needed to read them. With help, the team ultimately tracked down one in rough shape at the warehouse of the Australian Computer Museum. Volunteers helped refurbish the machine.
Software also has a shelf life. Ogier recalls trying to examine an old Quattro Pro spreadsheet file only to find there was no readily available software that could read it.