Tuesday, June 30, 2009

LoC on iTunes

The Library of Congress now has content on iTunes U. iTunes U is the area of the iTunes Store which offers open educational audio and video content from universities and other educational institutions. The Library’s initial iTunes U content includes historical videos such as original Edison films and a series of 1904 films from the Westinghouse Works, as well as event videos such as author talks from the National Book Festival, the "Books and Beyond" series, discussions with curators, and lectures from the Kluge Center. The audio content includes Library podcast series such as "Music and the Brain," slave narratives from the American Folklife Center, and interviews with authors from the National Book Festival. The collection also includes Library-produced classroom and educational materials, such as courses from the Catalogers’ Learning Workshop.

You must be running iTunes to be able to view the LoC content.

Saturday, June 27, 2009

new BIL on SourceForge and update to BagIt spec

This week saw a couple of events around the BagIt specification and tools.

A revision of the BagIt specification went out this week. You will note that it is still 0.96 -- the revisions were only in language to clarify some questions that had been received. There are some discussions going on about 0.97 - join the Digital Curation Google group. I'd like to see some more activity there!

Version 3.0 of BIL, the BagIt Library for Java, was released on SourceForge this week. It's available as binary and source code.

Plus, there was the BagIt video ...

BagIt video

The first in a planned series of digital preservation videos is available on the digitalpreservation.gov site -- an introduction to BagIt! Brian Vargas did a great job as "the talent" -- e.g., the narrator -- but folks should know that Brian was not selected just for his acting experience: he wrote many of our transfer tools (like the transfer scripts on SourceForge) and is a co-author of the BagIt specification.

The video premiered this week at the annual NDIIPP Partner's Meeting to great acclaim. It's aimed at a general audience.

EDIT: The NDIIPP site has added a great new page on the Transfer Tools with a link to the video.

Friday, June 26, 2009

Chesapeake Project Legal Information Archive

I came across a very interesting resource today -- the Chesapeake Project Legal Information Archive -- and the just-released results of a study they did on archiving legal resources on the web:

The Chesapeake Project Legal Information Archive has released a comprehensive report evaluating its digital preservation efforts during the project's two-year pilot phase.

The project evaluation reveals that nearly 14 percent — or approximately one in seven — of the online publications archived between March 2007 and March 2009 have already disappeared from their original locations on the Web but, due to the project's efforts, remain accessible via permanent archive URLs. A similar analysis in 2008 showed that slightly more than 8 percent of archived titles had disappeared from their original URLs, demonstrating a dramatic increase in "link rot," or inactive URLs, among archived content over the past year.

During the two-year pilot phase, the libraries participating in the project archived more than 4,300 digital objects and tracked more than 177,000 visits to www.legalinfoarchive.org, the home of The Chesapeake Project's digital archive collections. Users of the project's Web site visited from educational, government, and military institutions in the United States, as well as from countries abroad throughout the Americas, Europe, the Middle East, Asia, Africa, Australia, and the Pacific Islands.

Not too surprisingly, the second highest class of domain to where resource loss is found is .edu, after .info. Academic institutions are not always very conscientious about preserving access to their content, and with their academic term structure and the movement of faculty between institutions, web content on .edu sites is highly variable in its longevity. I don't see a characterization of how old the resources are that they harvested -- that can be very difficult to identify -- but it is a high percentage of bitrot, and there was quite an increase from the end of the first year to the end of the second year.

Download the PDF of their report.

Tuesday, June 16, 2009

milestones for the National Digital Newspaper Program

Today there was an exciting press event at the Newseum for the National Digital Newspaper Program, sponsored by the Library of Congress and the National Endowment for the Humanities. There was a great live demo, a video on digital production for the project from the University of Kentucky, and some nice speechmaking. The event promoted the milestone where the project surpassed 1,000,000 pages available at the Chronicling America site, the addition of seven new state partners, and the addition of images of illustrated newspaper supplements to the LoC Flickr Commons set (with more to come every month).

So far the AP has an article available, and there were representatives of other news outlets at the event. Check out the press release. Roy Tennant has a post that includes some of the technical specs supplied by my colleague Ed Summers. Ed and Dan Krech have done some great work to update the underlying application, improving the ingest and search functionality, adding the functionality that allows the site to be crawled, and exposing the data as RDF for a multitude of possibilities.

Edit: Here's the Washington Post article, and the official LoC blog posting.

Saturday, June 13, 2009

something odd happened today

Last weekend I went to my local public library (which I love), where I spotted a book that was on my to-be-read list. I keep a list of books I want to read, and periodically search the library's catalog to see if they have it at any of their branches. I had this book noted on my list as being held in the collection of my local branch. Depending upon how much I want to read the book, I'll put a hold onto the book if they have it in the collection but it isn't checked in. This is a book that held a middling position on my list for a while, a 2007 sequel to a science fiction novel by a newish but award-winning author which I liked but didn't love, but thought might be interesting. I grabbed the book off the shelf, but, in the process of wandering around and gathering up other books, I must have set it down and it didn't make it to the self-checkout with me, something I didn't discover until I got home. Ah well, I knew I'd be back this weekend, and maybe it would still be available.

I returned today and wandered over to the shelf. It wasn't there. I decided to look the book up and see when it was due and put a hold on it this time.

It wasn't there any more. It wasn't in the catalog, and the author wasn't in the catalog either.

I left with the books I found and one that was on hold for me. I considered asking about the missing book/author, but there was quite a line and I didn't want to hold people up while I asked my crazy-conspiracy-sounding questions -- how did this author and his books disappear in the last week? And why?