Saturday, August 30, 2008

web archiving

The Library of Congress has a phenomenal Web Capture team, staffed with very dedicated people who take a lot of effort to identify web sites that best document an event, crawl and capture sites through partner Internet Archive, work with cataloging to get the sites described to enhance discoverability, do quality control to make sure the archived sites will run correctly, and then make the archived sites live for public access. This process can take a very long time to ensure that a site is fully captured, preserved, and accessible.

The Web Capture team is, as they have with previous years, documenting the 2008 elections. They don't crawl sites without permission, and they always send requests. A colleague at another library sent me a link to a post and series of comments on Wonkette that were a reaction to a LoC request to capture the site. The post itself is fine. It is more than a bit surreal to get such a request from LoC -- they're going to collect what I write? -- and making fun of it is OK.

Some of the comments, however, are another story.

The reaction to the notice of the request includes strings of profanity, vulgarity, and various exhortations to "archive this, LoC!" Some comment that it's possibly a fake request similar to a Nigerian scam, some liken it to FBI wiretapping, and one comment says that it's a waste of taxpayer dollars to have federal employees reading websites in order to identify what should be archived. One comment conjectures that by "capture," we mean print out the site and store it in a box next to the Ark of the Covenant. Some of the comments are obviously humorous and some are serious, and it's hard to tell with others.

I have a sense of humor, especially about political topics. Of course it's funny to the Wonkette participants that whatever is said, whether profound or mundane or profane, the Library of Congress will crawl it. I remember my own reaction when I was approached about submitting my email to the MCN archives covering the period when I was on its board, which contained such highlights as "The membership brochure is at the printer" and "Don't faint when you see how much the conference hotel wants to charge us for internet access." But for some reason this really struck a nerve because some of the commenters were so "f--- you, Library of Congress." That saddened and angered me.

It's a huge effort to collect ever-changing interactive born-digital resources compared to print materials, but we and many others libraries do it because it's an equally important form of publishing. Libraries collect whatever is relevant regardless of their form of publication. Sites like these are important because they reflect what's really being said and what people really think about the political process. What about that isn't worth collecting?

I'll cop to being a bit overly sensitive on this, but only because I place very high value on such collecting activities.

No comments: