Digital Eccentric: January 2009

Sunday, January 25, 2009

National Film Board of Canada puts archives online

The National Film Board of Canada (NFB) has opened up its archives - more than 500 films, clips and trailers are now available on their new Screening Room web site. They're freely available for online viewing (there are costs for public broadcast and educational use), with more to be added regularly.

the burden of twitter

Steven Levy has written an essay for Wired about the guilt that one can feel for not participating enough in ones social network. Following tweets but not twittering, not blogging often enough, or not updating ones Facebook status. It's a brief but interesting read on privacy and a weird sense of duty to keep those public lines of communication open.

Nicholas Carr has posted a very interesting reaction to Levy's essay.

There's an arrogance to sharing the details of one's life in public with strangers - it's the arrogance of power, the assumption that such details somehow deserve to be broadly aired. And as for the people, those strangers, on the receiving end of the disclosures, they suffer, through their desire to hear the details, to hungrily listen in, a kind of debasement. At the risk of going too far, I'd argue that there's a certain sadomasochistic quality to the exchange (it's a variation on the exchange that takes place between celebrity and fan). And I'm pretty sure that Levy's remorse comes from his realization, conscious or not, that he is, in a very subtle but nonetheless real way, displaying an undeserved and unappetizing arrogance while also contributing to the debasement of others.

This seems a bit strong to me, but not entirely off base. Arrogance of power? Debasement? Sadomasochistic? OK, that may be true for some who participate in social networking, just the same as for some participants in a real life communities. There is something a bit egotistical in assuming that others will follow your tweets/blog/delicious tags/flickr set/facebook. There is something a bit creepy that, if you don't require approval, complete strangers read your tweets where you might be discussing where you are at any given time. I like to think that most use social networking to actually keep in touch, not to obsessively stalk one another.

There's that public sharing expectations thing again. I know, I think about this a lot. People I do not know read my blog, see many (but not all) of my flickr images, and join my delicious network to see most (but again, not all) of my bookmarks. I have made a conscious decision to share these things. I had to struggle with getting over the creepiness factor. It was well over a decade ago that a woman from China, upon being introduced to me at a conference reception, exclaimed "Oh! I know who you are -- you have an interest in folk art and you like armadillos!" She had come across my personal web page (remember those?) while researching the conference speakers.

There's no turning back. There's only self-selecting your level of exposure.

Folger Library launches online image collections

The Folger Shakespeare Library just expanded access to its Digital Image Collection by offering over 20,000 images online. The collection includes books, theater memorabilia, manuscripts, art, and 218 of the Folger’s pre-1640 quarto editions of the works of William Shakespeare.

Online use is through the Luna Insight Browser -- you have to add an exception to your popup blocker or the software will not function properly. To access their Shakespeare Quartos collection and to get full functionality (saving searches, exporting html pages) you have to install the free Insight Java client.

They have a "how-to" page and search tips available.

Library of Congress SourceForge release

Last month the Library of Congress had a soft launch of an open source software release. We officially announced the release in the January 2009 issue of the Library of Congress Digital Preservation
Newsletter. This is the first software that the Library has formally released as open source.

The tools are available through SourceForge under the “Library of Congress Transfer Tools” project. The project includes tools for use with BagIt specification, a hierarchical file packaging format for the exchange of digital content jointly developed by the Library of Congress and the California Digital Library.

Three tools developed by the Library's Repository Development Group are available now. Parallel Retriever implements a simple Python-based wrapper around wget and rsync to optimize the transfer of content between locations through parallelization. It supports rsync, HTTP, and FTP transfers. Bag Validator is a Python script that validates a Bag, checking for missing files, extra files, and duplicate files. VerifyIt is a shell script that verifies file checksums within a Bag manifest using parallel processes.

The Library plans to release additional tools as part of a suite of solutions and software development resources as they are completed over time. There are already more tools in the pipeline.

Friday, January 23, 2009

mobile is the new black

There's a new WorldCat Mobile pilot service.

NYPL has announced its NYPL Mobile beta.

The DC Public Library launched an iPhone app.

Stanford has a new version of an iStanford iPhone app that ties into its student services system.

The International Children's Digital Library launched an iPhone app last November.

technology transition at the white house

There was a great Washington Post article yesterday about how White House technology is "in the Dark Ages." I laughed bemusedly over my toast and read the article aloud at the breakfast table.

The White House is not being singled out. I work for a Federal Agency. I know folks who work at numerous other Federal Agencies, some of whom have worked at said agencies for decades. Federal agencies have many, many rules about hardware and software security, and every agency has to interpret and enforce those rules themselves. Security levels of content muddy the waters. This can cause a certain amount of confusion as to what is and isn't allowed. Someone told me that their agency (not the White House) hasn't yet approved Firefox. News that the White House counsel's office approved use of Gmail accounts for some press office activities has been forwarded to many Federal IT units, I'm sure.

Edit, 25 January: Wired has posted a Wired/Tired overview of White House tech, and a list of recent technology projects from various agencies. Nice to see the shout out for the LoC Flickr project.

Thursday, January 15, 2009

oclc summary of proposed google book settlement

Ricky Erway from OCLC has distilled the proposed Google Book Settlement, its appendices, and the three library registry agreements from 320 pages to a 4 1/2 page summary. It's an excellent overview of the proposal.

d-lib article on some LC tool development

My colleague Justin Littman has just published an excellent article in the January/February 2009 issue of D-Lib Magazine: "A Set of Transfer-Related Services."

"The Office of Strategic Initiative's (OSI) Repository Development Team (RDT) is developing a portfolio of services and components to address the challenges posed by scaling transfer processes. While the portfolio is expanding, the focus of this article will be on two core services, the Inventory Service and the Workflow Service. Before proceeding to examine these services, it will be useful to further delineate the transfer problem space. After examining these services, their role in mitigating preservation risks will be considered."

Monday, January 12, 2009

presidential records and donation reform

On January 7, 2009, the U.S. House of Representatives approved H.R. 35, the "Presidential Records Act Amendments of 2009," and H.R. 36, the "Presidential Library Donation Reform Act of 2009." These were chosen by the House leadership as the first pieces of substantive legislation passed in 2009 as a symbol of government transparency.

The Presidential Records Act Amendments restores meaningful public access to presidential records by nullifying a 2001 Bush executive order, and the Presidential Library Donation Reform Act requires the disclosure of big donors to presidential libraries. The Senate still has to pass its versions of the bills before they can go to soon-to-be-President Obama to be signed, which he has apparently indicated that he would.

The National Coalition for History provides a good overview of the Records Reform Act. The House Speaker's site provides an overview of both.

Sunday, January 11, 2009

I want a Palm Pre

Pretty much everyone who knows me knows my loyalty to Palm. I've had one since 1997. I've been syncing it with enterprise calendar systems so I have my personal and work calendars going back to February 1996. I currently have a Centro, and even though I have to manually key in all my work events because we use such an old version of Groupwise that I can't seem to find a sync that works, I am still devoted to my Palm.

I cannot count the number of friends and colleagues who have iPhones and have done their best to convert me. The Urbanspoon app and its clever use of the accelerometer to randomize recommendations by shaking the phone almost had me. My answer is always that if they can promise me that I can port over everything I have on my Centro -- 13 years of calendar, hundreds of contacts, notes, to-do lists, and ebooks -- then I'll consider it.

I am now waiting with baited breath for the Palm Pre. It's such a step forward in the interface (a card stack metaphor) and operating system (its WebOS is Linux) and browser (based on WebKit). It doesn't use the Palm desktop anymore, which while be a paradigm switch for me, but they have promised data migration tools for Centro users.

I know that some apps I use won't work anymore, and Raymond's concern about no backwards compatibility of the OS is a real one. But they need to move on and I need to move on, and I'm glad it looks like it will be to another Palm. You can't always be fully backwards compatible. Hey, I only complained a little when I discovered I couldn't run FileMake Mobile on my Centro, didn't I?

Reviews at Gizmodo, ars technica, PC World, and a PC World FAQ.

Saturday, January 03, 2009

on electronic texts

I just read an article at Information Today by Nicholas Tomaiuolo, an instruction librarian at Central Connecticut State University, entitled "U-Content: Project Gutenberg, Me, and You." He outlines the requirements and steps for preparing an etext for Project Gutenberg.

At one point in the article, there is a discussion about the requirements for full text, not just a PDF created from page images. The author wrote this from the point of one unfamiliar with PG's requirements, illustrating the process one might follow to create an acceptable PG submission -- images to PDF, and images to OCR to corrected plain text -- I found myself thinking quite a bit about the often heard statement (not in this article, mind you) that PDF is the ultimate format for texts.

I'm in no way denigrating PDF. PDFs is an absolutely required format for texts. PDF is highly portable and shareable and readable, and, if the source files are good enough, clearly printable. But it's not innately analyzable or easily repurposed. That requires full text.

I am not unfamiliar with what it takes to create an accurate plain text transcription of a text. When Gutenberg was in its early days, we were really talking about transcriptions, as in people typing in text. OCR has greatly streamlined that process, but the proofreading required is non-trivial. Want to work with a highly formatted text, or one with tables or formulae or figures? Challenging. Adding layers of structural and semantic markup to plain text, as with TEI, is time consuming. Rich markup, including identifying dates or names or geographical places, or providing normalized versions of said dates and names is a large undertaking. A full text with structural and sematic markup can be repurposed into many formats, including ebooks and PDF.

And you do want ebooks. Some months ago I had the great opportunity to demonstrate the prototype World Digital Library site at the National Book Festival. There is no greater focus group than thousands of people who love to read! The two top requests were that the books should be downloadable as ebooks and that all the text content be available as full text in all seven project languages. These were not academics or librarians (although there were some of the former and many of the latter who stopped by), but parents and commuters and researchers and genealogists.

Both are daunting requests when you do not have full text available to work from. There will be PDFs. The others are goals to strive for.

Friday, January 02, 2009

flickr commons developments

I am not part of the Library of Congress Flickr project team, and I in no way speak for them.

There is a lengthy discussion on Wired and at Found History about The Commons on Flickr, given that Yahoo laid off key staff member George Oates who shepherded the project. It doesn't seem that the project is in any imminent danger, but a dedicated group has stepped forward to evangelize, innovate, and curate thematic sets from the across the collection. I'm thrilled to see a community of use developing around The Commons, but it's sad that this is what precipitated its full coalescence.

DCC obsolescent data and files challenge

There's still time to send a message to Chris Rusbridge at the Digital Curation Center to enter his personal data recovery challenge:

I will do my best to recover the first half dozen interesting files that I’m told about… of course, what I really mean is that I’ll try and get the community to help recover the data. That’s you!

OK, I define interesting, and it won’t necessarily be clear in advance. The first one of a kind might be interesting, the second one would not. Data from some application common of its time may be more interesting than something hand-coded by you, for you. Data might be more interesting (to me) than text. Something quite simple locked onto strange obsolete media might be interesting, but then again it might be so intractable it stops being interesting. We may even pay someone to extract files from your media, if it’s sufficiently interesting (and if we can find someone equipped to do it).

The only reference for this sort of activity that I know of is (Ross & Gow, 1999, see below), commissioned by the Digital Archiving Working Group.

What about the small print? Well, this is a bit of fun with a learning outcome, but I can’t accept liability for what happens. You have to send me your data, of course, and you are going to have to accept the risk that it might all go wrong. If it’s your only copy, and you don’t (or can’t) take a copy, it might get lost or destroyed in the process. You’ll need to accept that risk; if you don't like it, don't send it. I might not be able to recover anything at all, for many reasons. I’ll send you back any data I can recover, but can’t guarantee to send back any media.

The point of this is to tell the stories of recovering data, so don’t send me anything if you don’t want the story told. I don’t mind keeping your identity private (in fact good practice says that’s the default, although I will ask you if you mind being identified). You can ask for your data to be kept private, but if possible I’d like the right to publish extracts of the data, to illustrate the story.

His deadline is Twelfth Night, January 5, 2009. Read his post for the rest of the details.

I've been thinking a lot lately about data migration and recovery, and I think this is a great way to illustrate the challenges and solutions -- with real data and the stories behind its creation, loss, and potential recovery.

make

I am definitely a proponent of the self-made. Cooking, clothing, jewelry, assisted forays into small electronic projects, etc. At various times I have designed and made costumes, painted, made paper, etched prints, made lampwork glass beads, and learned the basics of Chinese brush painting. I haven't had as much time for that in the last couple of years, but that doesn't mean that I don't still strive to make.

I'm a big fan of Make, and so are a lot of others. In that vein, William Turkel has posted his list of books for humanist makers.

interview with Vint Cerf

Siva Vaidhyanathan has posted his interview with Vint Cerf about developments in search technology and Google.

Thursday, January 01, 2009

resolutions

I don't know if these are resolutions or goals ...

I have been getting back to writing in the past few weeks, but I will write more about what we're working on and getting the word out.

I will be a more hard-nosed project manager vis-a-vis deadlines. I am too often too understanding of delays.

I will explore Washington D.C. more. I lived in metro D.C. for part of my childhood and have been in Virginia for 7 years, but still have a disconnected sense of the city. Perhaps that's in part an artifact of traveling by Metro and not having a real sense of the city's topography.

I will meet more people in other divisions at LoC. This is harder than it sounds.

I will eat lunch at my desk less often. I will make more of an active effort to organize social opportunities. (Why don't I remember this every time I think about switching jobs -- it gets harder and harder to build a new social network every time.)

I have faith that our house in Charlottesville will finally sell.

EDIT: There is another long term goal that I'm already working on: I've signed up for a Japanese class through the Federal employee graduate school. This is the year to begin transforming my ragtag knowledge of Japanese into usable language skills.

Digital Eccentric