There is a great article on ars technica today about the major processing effort that will be required at the National Archives when the Bush administration leaves office. The ars technica piece references a New York Times article on the topic from this past weekend.
This section really strikes home:
The contingency plan will entail "ingesting" the Bush White House's data into a separate system before integrating it with the ordinary archive. As the plan explains, "the current PERL [Presidential Electronic Records Library] system architecture was not scalable to actually support the volume of records that are expected from the current Presidential administration."First, the use of quotation marks should remind us all that "ingest" means absolutely nothing to someone who is not a repository manager.
It's not just size that matters, though: the Archives will also need to process reams of information locked in some quaint proprietary formats. The RMS index, for example, "consists of an implementation of a customized older version of Documentum running on Oracle, with image files (including copies of scanned records) incorporated as objects in the database." The photos are stored in a "proprietary photo management software called MerlinOne, running on Microsoft SQL as the database engine," and it has apparently taken several months to extract the images and metadata for relinkage outside the Merlin format.
I have participated in some discussions about a potential data migration project at work. I recently saw an inventory of media formats -- not file formats, but media formats -- that the project would need to encompass, and it is lengthy. The only source I can think of for hardware to read some of the formats is EBay. That doesn't even take into account the files themselves. It's interesting how quickly a format becomes obsolete, and how many customized systems federal agencies use.