Thursday, December 31, 2009

2009 in review

As is always the case for me at the end of the year, I find myself waxing nostalgic.

What were my favorite books of the year?

  • Finch
  • The City and the City
  • Sandman Slim
  • The Chalk Circle Man
  • Chronic City
  • The Year of the Flood
  • The Girl who Played with Fire
  • The Public Domain: Enclosing the Commons of the Mind
  • Cheap: The High Cost of Discount Culture
  • Free: The Future of a Radical Price
My favorite movies?
  • District 9
  • Food, Inc
  • Julie and Julia (the Julia parts)
  • Coraline
  • Star Trek
  • Coco Before Chanel
  • Up (I know, I don't usually like sentimental things, but this was just so darned likable)
There's a longer list of movies I haven't seen but want to:
  • Avatar
  • Sherlock Holmes
  • Up in the Air
  • A Single Man
  • A Serious Man
  • The Young Victoria
  • Bright Star
  • Inglorious Basterds
  • 9
  • Fantastic Mr. Fox
  • The September Issue
  • The White Ribbon
  • Bad Lieutenant
  • Creation was never released in the U.S., but it looks like I'll get to see it at a screening in January
My favorite conferences?
  • OR09
  • DigCCurr 2009
  • 2009 NDIIPP partners' meeting
  • We released our BIL Java library on SourceForge to support the BagIt standard. Kudos to Justin Littman and Brian Vargas.
  • We moved a number of our tools into supported LoC production, and opened up some of our in-development tools for limited external partner testing. Kudos to Justin Littman, Dan Chudnov, Dan Krech, Paul Petty, Jon Steinbach, Chun Yi, Praveen Bokka, Sohail Aslam, and Brian Vargas.
  • We launched an expanded internal LoC transfer and workflow service with a greatly improved UI (and more features and improvements to come). Kudos to Justin Littman, Dan Chudnov, Paul Petty, Chun Yi, and Brian Vargas.
  • The National Digital Newspaper Program hit a million page milestone and updated their entire underlying infrastructure. Congratulations to David Brunton, Deb Thomas, Ray Murray, Ivey Glendon, Henry Carter, Tonijala Penn, Dory Bower, Ed Summers, Dan Krech, Dan Chudnov, Curt Harvey, Justin Littman, and Brian Vargas.
  • The World Digital Library launched. Congratulations to Dave Hafken, Michelle Rago, Sandy Bostian, Kapil Thangavelu, Risa Ohara, Mike Giarlo, Sohail Aslam, Paul Petty, Chun Yi, and Laura Keen.
  • Thanks to our QA testing team for all their hard work on all the group's projects during the year: JoKeeta Joyner, La Tonya Freeman, Tasmin McDonald, and Preethi Mothkupally.
  • Thanks to our Ops team - Scott Phelps, Salim Malik, Ken Stailey, and Kurt Yoder - for all their support on all our group's projects this year.

stories from a maker childhood

A TV show about vintage toys brought on a discussion in our house of toys we had when we were kids. Not too surprisingly to anyone that knows me, my favorite activities were making things and reading.

My love of all things spooky, supernatural was inborn in me. My earliest comic books at 5 years old were Casper the Friendly Ghost and Wendy the Little Witch. I attended church preschool at The Falls Church, and I was often found wandering the small cemetery that is there. I still have a glow-in-the-dark ghost family that I know we bought one figure at a time on visits to the drug store in Falls Church. It should be no surprise, then, that my absolutely favorite toy from my childhood was the Thingmaker. I have a photo from either Christmas 1968 or my birthday in 1969 where I am joyfully displaying my favorite present - a mold with which to make my own little skeletons.

Don't remember the Thingmaker? It was later re-branded as "Creepy Crawlies." It was basically a hot plate, accompanied by metal molds, into which you poured colored "Goop." The heat set up the goop in the mold, and once the mold was cooled you had soft rubbery things. There were molds for bugs, but there were also molds with which you cast parts to make larger items (skeletons) or 3D objects (so-called "Dragons," which were THE hot trading item when I was in the 2nd grade after clackers, those glassy resin balls on thin rope that you clacked together to make really loud noises and sort of perform tricks). You could mix the colors of goop and create some really startling color combinations. It later years they also had "jewel" molds with jewel powder to cast hard plastic jewels. I am sure the company that manufactured these made quite a bit off the goop and jewel powder consumables. The product disappeared and came back in the 1990s in a safer version but it just didn't look as good to me. OK, I guess it's not acceptable anymore to give 5 year-olds a toy that consisted of an open hot plate, metal molds, and some flimsy tongs with which to extract the hot molds, but I really loved that toy.

I did not have an Easy Bake Oven. Mom would give me a toy that was an open hot plate but not one with an enclosed light bulb? I once enacted the roasting of my talking Bugs Bunny puppet with a neighbor girl with a roasting pan in a dresser drawer. I spent a lot of time at her house because they had a color TV and her mother would let us watch "Dark Shadows" after school.

I was not one for playing much with dolls. I had a large baby doll and baby furniture - I cherished and still have the doll blankets that my mother knitted but can barely remember the doll. I had Barbie dolls (the rotating electric kitchen of the future was my favorite accessory), a Chrissy doll with hair that grew and retracted again (that fascinated me), and Dawn dolls (I loved her dress with the crystal-pleated organza skirt, and her beach house with the inflatable pool). I later transformed some of those dolls into superheroes by making them little costumes. Then there was the one I turned into an Andorian by painting her skin blue with permanent marker and coating her hair with liquid paper. I gave all those dolls away to the daughter of a friend in the late 1980s.

I did have a dollhouse, my most desired gift for Christmas 1969. That was the Imagination dollhouse, an amazing reconfigurable mid-century modern style plastic dollhouse that consisted of three movable transparent colored plastic structures. The figures and furnishings were all sleek and modern. I know they sometimes appear on Ebay. In a future where nostalgia overwhelms me and I am flush with cash and storage space, I may buy one.

I never had Legos, but I did have Lincoln Logs. I have no idea if Mom knew they were developed by one of Frank Lloyd Wright's sons, but she had a Frank Lloyd Wright obsession (she lived in the FLW Imperial Hotel in Tokyo in the 1950s) that she passed along to me.

Mom was a maker at heart. She knit and crocheted, and had a fondness for paper crafts. Somewhere I have a picture from Easter 1970 where you can see on a table an astonishing tableaux of two stylized rabbits, where the clothing/bodies were constructed of a number of different coordinated patterned and plain colored glossy stiff paper (why do I remember that the paper came in folded squares from a Hallmark store?), the heads were decorated blown eggs made to look like bunnies, and they had as Easter hat and an Easter bonnet perched over their ears. I know the templates came from a magazine. Mom kept them for years, but I did not find them when I cleaned out her house after her death.

We crafted a lot together. Some time around 1970 Mom bought The McCalls Giant Golden Make-It Book for us. It was full of templates and instructions to make dozens of projects. Mom was annoyed by my lack of patience in waiting for glue to dry and my insistence in using scotch tape for every paper project instead. She despaired of my seemingly profligate use of tape. Yes, I still have that book.

Mom was an excellent cook but a so-so baker. At Christmas she obsessed about making cookies. Her attempts at bread were disastrous, so she resorted to frozen bread dough. Her Pfeffernusse were like dog kibble. When I was in elementary school I bought The Cookie Book by Eva Moore through one of my Scholastic book orders. It had one recipe for each month of the year, and we made its December sugar cookie recipe with "Peanuts" Christmas cookie cutters which we decorated in glorious detail. (I still have the book, too, and think it has the best Snickerdoodle recipe.) Mom also had a cookie press and made butter cookies that she dyed in batches of red and green. Some years they were very pale tints and some years they were very vivid. Both were unappetizing to look at but yummy. I have that cookie press in its original box with all its dies, and I use it almost every Christmas. I do NOT dye my dough.

I never picked up knitting. I was OK at crocheting. I loved embroidery, needlepoint, and sewing. Mom taught me to sew, I had classes as part of my Girl Scout sewing badge, and I took a summer school needlearts class (we will not speak of my knitting attempts in that class). I still sew but I haven't tried anything else in decades. I am daunted by my expert knitting friends.

Mom had excellent copyist drawing skills. She never created any original works that I remember, but she could copy anything. She was astonishingly skilled with charcoal and pastels. She and I took an oil painting class together when I was a child - the instructor must have been extraodinarily understanding that she let a single mother bring her elementary-aged daughter to class with her. Luckily I was a good painter. My drawing skills were never great. I took lessons in Chinese ink painting in middle school, and I somehow talked my way into a life-drawing class when I was 17 (in post-Proposition 13 California most high school art classes were canceled, so I took adult education and community college classes). My high school classmates just could not deal that I was drawing nude models. I also took a print-making class. I was working with oversize printing plates and had to work with them in the acid with my bare hands. The yellow and black chemical discoloration of my hands freaked out my high school chemistry teacher, who was afraid I'd done it in her class. She was relieved and horrified when I told her what I was doing.

My early childhood room in a number of houses was decorated with little paint-by-numbers paintings. Mom loved the precision of those little kits and pots of paint. I hated the clowns and strange, stylized dogs. I still have the ocean scenes she painted for me. I don't remember doing this myself. I preferred playing with Colorforms when I was 4 and 5. And my Etch-a-Sketch. And my favorite toy before my getting my Thingmaker - a Lite Brite. I always used up the black sheets of construction paper out of the mixed-color pads first as Lite Brite refills. The sensation of pushing the light pin through the paper and through the round mesh and seeing the pin light up was just so cool.

I may have had a couple of tiny Liddle Kiddle dolls, but what I loved was my Liddle Kiddle branded tracing light box. I used that light box - its body was lavender plastic - at least 15 years through the mid-80s when it finally died. I also had a Barbie branded "fashion plate" set that consisted of a series of outline templates that you used to draw Barbie figures that you could color in. There were patterned rubbing plates you could use to create textures. I loved the fashion design aspect of it.

Didn't every kid in the 1970s have a Spirograph? I could create intricate patterns for hours, and I kept a stash of colored ballpoint pens. Actually, I know not every kid had one because I always took it with me to my cousins' house, along with the Barbie fashion plates. My cousin Sandy may have had the Mousetrap and Green Ghost games, but I had those.

Thursday, August 13, 2009


My colleague Thorny Staples often uses the metaphor that digital humanities projects are, at their most basic level, online exhibitions. Curated content is presented with key descriptive information not unlike exhibition tombstone labels and contextualized through categorization and by scholarly essays of varying lengths as well as site information architecture (not unlike rooms of an exhibition with wall texts). The end results include the identification and explication of relationships and the presentation of deep readings of objects. That metaphor always resonated with me.

In a recent discussion a small group was trying to work out some generalized models to for the processes we follow from the receipt/creation of digital files through to providing access. We were having a particularly lengthy discussion about description and contextualization -- at what point in a digital file's life cycle is it related to other files and identified as a digital object, and at what point is some sort of intellectual meaning overlaid onto that digital object?

My new colleague Terry Harrison -- a big fan of using metaphors -- commented that when museums acquire objects they cannot know every context in which the object will be exhibited or published in the future, but they acquire it and put effort into description and conservation to prepare for future display/publication when the object will be contextualized many times over.

This sent me down the road to a metaphor that's still developing in my head which may not yet translate to something that anyone beside me thinks is sensible. Or it may not be sensible at all.

First, I'm starting with an assumption that there are four very broad categories of activities that we need to describe (leaving out "preservation" for now). On the museum side, it's these:

Acquisition: Items are proposed, selected, and acquired
Accessioning: Items have accession numbers assigned, are assigned storage locations, relationships between parts are identified (a tea set is made up of individual components), and basic descriptive information is recorded in a registration system
Preparation: Items are cleaned, repaired, mounted, framed, or otherwise stabilized and made ready for research use and public viewing
Exhibition: Items are further described and presented in the context identified by a collection or exhibition curator; an object will be exhibited many times and assigned to multiple contexts

This roughly translates to this in the digital realm:

Creation/Transfer: Selection and digitization or transfer of digital (master?) files to an institution
Inventory: Files are assigned identifiers/names, placed into some sort of meaningful (or not) storage location in a server environment
Processing: QA, manipulation, derivative creation
Access: Making content discoverable and usable, which can include a curator providing context and intellectual overlays for objects (not files)

I'm having one real issue in making this metaphor work for me and for others, and that's around the creation of metadata and recording of file relationships. At what point is the relationship of files to each other recorded? Is the creation of metadata identifying/describing an intellectual object part of inventory, processing, or access? When is the relationship of files to that intellectual object recorded?

I think that inventorying should include a step whereby the relationships between files are recorded so it is recognizable that some set of 300 files go together. There wasn't a lot of push back on this in our discussion. When descriptive metadata for an intellectual object is created and when the relationship of files to an intellectual object are recorded engendered a lot of discussion. I personally think that descriptive metadata for intellectual objects represented by those files is also created during the inventory stage, and that files in hand at that stage should in some way be associated with the intellectual objects at that time.

This is complicated because the recording of all the relationship of files to intellectual objects is not fully possible until objects are prepared and added to an access application. That's where the contextualization happens, so one can argue that that is where intellectual objects are truly defined and the process of associating files to objects takes place. Preparation is driven by access. If access applications are siloed at all, each might use different derivative files, and there has to be some association of those derivatives to the master and to the intellectual objects.

So, we have master files, derivative files (possibly multiple sets over time per access point), intellectual object metadata, relationships of all files to each other and to that intellectual object, and the need to inventory and manage all of the above. Which may be separate from an access application or multiple access points. Where is this recorded, in what order, where, and how do we describe these activities? I'm struggling with that part of the metaphor/model.

How did this conversation arise? Well, we're trying to scope out some future directions and activities, and a shared understanding of the model for the activities we support is vital. Mine is not the only model proposed and it just may not be right. I'm sharing this as much for my own process as anything else.

Tuesday, June 30, 2009

LoC on iTunes

The Library of Congress now has content on iTunes U. iTunes U is the area of the iTunes Store which offers open educational audio and video content from universities and other educational institutions. The Library’s initial iTunes U content includes historical videos such as original Edison films and a series of 1904 films from the Westinghouse Works, as well as event videos such as author talks from the National Book Festival, the "Books and Beyond" series, discussions with curators, and lectures from the Kluge Center. The audio content includes Library podcast series such as "Music and the Brain," slave narratives from the American Folklife Center, and interviews with authors from the National Book Festival. The collection also includes Library-produced classroom and educational materials, such as courses from the Catalogers’ Learning Workshop.

You must be running iTunes to be able to view the LoC content.

Saturday, June 27, 2009

new BIL on SourceForge and update to BagIt spec

This week saw a couple of events around the BagIt specification and tools.

A revision of the BagIt specification went out this week. You will note that it is still 0.96 -- the revisions were only in language to clarify some questions that had been received. There are some discussions going on about 0.97 - join the Digital Curation Google group. I'd like to see some more activity there!

Version 3.0 of BIL, the BagIt Library for Java, was released on SourceForge this week. It's available as binary and source code.

Plus, there was the BagIt video ...

BagIt video

The first in a planned series of digital preservation videos is available on the site -- an introduction to BagIt! Brian Vargas did a great job as "the talent" -- e.g., the narrator -- but folks should know that Brian was not selected just for his acting experience: he wrote many of our transfer tools (like the transfer scripts on SourceForge) and is a co-author of the BagIt specification.

The video premiered this week at the annual NDIIPP Partner's Meeting to great acclaim. It's aimed at a general audience.

EDIT: The NDIIPP site has added a great new page on the Transfer Tools with a link to the video.

Friday, June 26, 2009

Chesapeake Project Legal Information Archive

I came across a very interesting resource today -- the Chesapeake Project Legal Information Archive -- and the just-released results of a study they did on archiving legal resources on the web:

The Chesapeake Project Legal Information Archive has released a comprehensive report evaluating its digital preservation efforts during the project's two-year pilot phase.

The project evaluation reveals that nearly 14 percent — or approximately one in seven — of the online publications archived between March 2007 and March 2009 have already disappeared from their original locations on the Web but, due to the project's efforts, remain accessible via permanent archive URLs. A similar analysis in 2008 showed that slightly more than 8 percent of archived titles had disappeared from their original URLs, demonstrating a dramatic increase in "link rot," or inactive URLs, among archived content over the past year.

During the two-year pilot phase, the libraries participating in the project archived more than 4,300 digital objects and tracked more than 177,000 visits to, the home of The Chesapeake Project's digital archive collections. Users of the project's Web site visited from educational, government, and military institutions in the United States, as well as from countries abroad throughout the Americas, Europe, the Middle East, Asia, Africa, Australia, and the Pacific Islands.

Not too surprisingly, the second highest class of domain to where resource loss is found is .edu, after .info. Academic institutions are not always very conscientious about preserving access to their content, and with their academic term structure and the movement of faculty between institutions, web content on .edu sites is highly variable in its longevity. I don't see a characterization of how old the resources are that they harvested -- that can be very difficult to identify -- but it is a high percentage of bitrot, and there was quite an increase from the end of the first year to the end of the second year.

Download the PDF of their report.

Tuesday, June 16, 2009

milestones for the National Digital Newspaper Program

Today there was an exciting press event at the Newseum for the National Digital Newspaper Program, sponsored by the Library of Congress and the National Endowment for the Humanities. There was a great live demo, a video on digital production for the project from the University of Kentucky, and some nice speechmaking. The event promoted the milestone where the project surpassed 1,000,000 pages available at the Chronicling America site, the addition of seven new state partners, and the addition of images of illustrated newspaper supplements to the LoC Flickr Commons set (with more to come every month).

So far the AP has an article available, and there were representatives of other news outlets at the event. Check out the press release. Roy Tennant has a post that includes some of the technical specs supplied by my colleague Ed Summers. Ed and Dan Krech have done some great work to update the underlying application, improving the ingest and search functionality, adding the functionality that allows the site to be crawled, and exposing the data as RDF for a multitude of possibilities.

Edit: Here's the Washington Post article, and the official LoC blog posting.

Saturday, June 13, 2009

something odd happened today

Last weekend I went to my local public library (which I love), where I spotted a book that was on my to-be-read list. I keep a list of books I want to read, and periodically search the library's catalog to see if they have it at any of their branches. I had this book noted on my list as being held in the collection of my local branch. Depending upon how much I want to read the book, I'll put a hold onto the book if they have it in the collection but it isn't checked in. This is a book that held a middling position on my list for a while, a 2007 sequel to a science fiction novel by a newish but award-winning author which I liked but didn't love, but thought might be interesting. I grabbed the book off the shelf, but, in the process of wandering around and gathering up other books, I must have set it down and it didn't make it to the self-checkout with me, something I didn't discover until I got home. Ah well, I knew I'd be back this weekend, and maybe it would still be available.

I returned today and wandered over to the shelf. It wasn't there. I decided to look the book up and see when it was due and put a hold on it this time.

It wasn't there any more. It wasn't in the catalog, and the author wasn't in the catalog either.

I left with the books I found and one that was on hold for me. I considered asking about the missing book/author, but there was quite a line and I didn't want to hold people up while I asked my crazy-conspiracy-sounding questions -- how did this author and his books disappear in the last week? And why?

Tuesday, May 26, 2009

how did a month go by?

In re-writing the opening sentence to this post about seventeen times, I have alternated between apologizing, rationalizing, making excuses for, and outright ignoring that I haven't posted here in a month.

I've been attending conferences and traveling a lot. Four meetings/trips in three weeks, and four states (yes, one state was Virginia, but I was off site for three days, followed the next day by a trip over two hours away and overnight for two nights, so that counts). That doesn't stop most folks from continuing to reach out and share, but I find travel very draining. I can happily spend my days chatting with colleagues, taking notes and tweeting, and talking about what excites me about my job. By the time I collapse in my room at the end of the day, I sometimes feel like I hope to never discuss the BagIt specification again (But I will, you know I will, and with great enthusiasm). And when I get home, I hole up and do not feel social for a good 24 hours. Yes, I might be the most outgoing Myers-Briggs "I" out there, but I'm still an I who just wants to sit quietly and think for a while.

And, if I also want to make some semi-valid excuses, my work PC died again and it was out of my possession for 3 weeks, one of my projects had a major deadline that was almost fully met on time and required some last minute scrambling on my part so I didn't blow the deadline too badly, and we had to pack up and move out of our office suite so some duct repairs could take place. I should not even admit how far behind I am in studying for my Japanese class.

I hope to resume normal blogging this week. The coming attractions: the IS&T Arching 2009 conference, Open Repositories 2009, and a visit to Scola, the Library's international newscast preservation partner.

Monday, April 27, 2009

Digital Karnak

I am a huge fan of 3-D visualizations of archaeological sites, and there's a new one developed by a team under Diane Favro and Willeke Wendrich at UCLA. Digital Karnak provides a Google Earth visualization of the site of Karnak, a massive temple complex in Egypt that was in use for some 1,500 years. There's a nice interactive timeline through which you can view the development of the site over time. Start with the overview if you're unfamiliar with Karnak.

The web site includes an amazing archive consisting of stills from the 3-D model and photographs from the archaeological site. I'd like to see that expanded some day to include any smaller objects from Karnak that are in various cultural heritage collections. Historical renderings (there are known drawings from the early 18th century onwards) would also be a nice addition.

There's a nice article in the Chronicle of Higher Education.

Tuesday, April 21, 2009

World Digital Library Launch

The World Digital Library is now available.

The site is launching with 1,170 objects from 26 partner institutions. WDL focuses on significant primary materials reflecting the cultural heritage of all UNESCO member countries, including manuscripts, maps, rare books, recordings, films, prints, photographs, architectural drawings, and other types of primary sources from varying time periods. The project will continue to add content to the site, and will enlist new partners from the widest possible range of institutions and countries.

The site is available in seven different languages: Arabic, Chinese, English, French, Russian, Spanish, and Portuguese. The content is not translated -- the items appear in their original language. The metadata and all the site navigation is translated to make it possible to search and browse the site in any of the languages. The metadata came from partner institutions or was created by catalogers at the Library of Congress, and much of the translation was provided by Lingotek.

The site was built using the Django Python framework, nginx, Lucene/Solr, and a mySQL database. The zooming in the imageviewer and pageturner is Seadragon Ajax. There is heavy use of Javascript, jquery, JSON and underlying XML. Check out the image carousels and timeline tool! The project also developed a cataloging tool to manage the metadata and cataloging process and interact with the Lingotek translation system via their API.

Sunday, April 12, 2009

museum data exchange software

OCLC, funded by the Mellon Foundation and working with the software company Cognitive Applications, Inc, has released COBOAT and OAICat Museum to support data interchange between museums. This work is happening under the auspices of their Museum Data Exchange Project.

So what, many people will say? It should already be easy to share museums data, right?

Not so much.

The museum collection management system arena has some major vendors (Gallery Systems, Willoughby, Minisis, Cuadra, etc) and some smaller vendors (Re:discovery, PastPerfect, etc.), and countless (and I really mean countless) home-grown systems running on FileMaker, Access, and MS-SQL. I know, because I spent many years working for museums and I was on the board of the Museum Computer Network, a group that dilligently worked on many interchange initiatives. I worked with software from 3 vendors and managed a FileMaker-based system. Getting data in was easy. Getting data out was often hard. Participation in data aggregation projects took a lot of effort. And most small- or medium-sized museums (and there are many, many more of them than large museums) have little or no technology staff to enable data sharing. And there is no common data schema in the community.

The museum community itself has sometimes slowed progress. When discussion of relevant library community standards were mentioned, some said "We're nothing like libaries! Our collections are unique! Their standards are not for us!" That attitude seems to have adapted in the last 10 years.

I am glad to see something like this going forward. A fee-free tool that can help museums extract data from black-box vendor systems and enable sharing? Bring it on.

Friday, April 10, 2009

open repositories 2009

The abstracts are now available for the presentation and poster sessions at OR09. This is one of my favorite conferences to attend and present at.

Sunday, April 05, 2009

DigCCurr 2009

I was in Chapel Hill the first week of April for the DigCCurr 2009 conference and to attend a meeting to brainstorm about personal digital collection preservation. I thought the conference was very good, better than the first one in 2007. I saw many excellent presentations, had some great conversations, and got a good response to my presentation on LC's work with file transfer and inventory tools. As with the last conference, I walked out thinking that I should have been an archivist.

I strongly recommend the proceeding form DigCCurr 2009. They're available as a free download from Lulu, or you can buy a POD version. You can also look up the very active twittering history at #digccurr.

I found it strangely hard to write up my notes from this meeting. I think it's because I'm still struggling with some aspects of the digital preservation problem space.

I absolutely agree that the activities of traditional archival practice have a place in the preservation of digital records. Where I found myself disagreeing with some presenters is in the balance between collecting and saving what we can versus an appraisal process to select what we will collect/save. In collection development practices for general collections, there is the often-held discussion about never knowing what might prove useful in the future, so it is a disservice to be too selective now. I guess that I have taken that point of view to heart, and I want to see our institutions cast as open a net as possible for digital collections. If we don't grab it when we can, there will be nothing to select.

I also found myself bristling occasionally over the implied scope of the term "digital collections" as I most often heard that phrase used at the meeting. There was very much a focus on electronic records and the digital realm of personal papers. Of course there were some great discussions around multimedia, web sites, audio/video, and image collections, but what I pretty much never heard anybody mention was born-digital scholarship and teaching and learning materials.

My first web site preservation project was at the Harvard Design School in the late 1990s, where, while developing courseware software, I realized that we were losing the history of what we taught and the products of the courses as we overwrote sites every term. Part of an institution's records are its lists of course offerings, course syllabi and reading lists, and, for some courses, the projects that the students created and put online in the course site. This was particularly true at at graduate school with programs in architecture, landscape architecture, and urban planning where the studio courses produced important site-specific work and case studies that was often lost after every term. I felt so strongly about this that I launched a course site preservation project that would have involved retrieving sites off server archives. We were looking at using METS (in its early days) to map the sites. But, as often happens, I ended up leaving before the project got very far along and no one felt nearly as devoted to the project as I did and it didn't go very far.

At UVA we launched a project called "Sustaining Digital Scholarship" to preserve born-digital scholarship, primarily in the humanities and social sciences. We instituted a technical assessment process and were working on documenting and migrating some major digital scholarly resources with varying strategies. That project is still going on in a limited way. It can take a lot of resources to assess and document a large digital archive.

That said, I was excited by some of the tools that I saw. ACE from the University of Maryland. MOPSEUS from Greece. The PARSE.Insight draft preservation roadmap. CASPAR for representation information. PLATO and Hoppla from Austria. LANL's ReMember Framework for OAI-ORE. CDL's Pairtree directory structure. Prometheus and MediaPedia from Australia. All very much worth looking into.

There was also a thread in this meeting on the use of digital forensics, transitioning some tools and practices from legal digital forensics into archival digital forensics. This interested me very much and I intend to read up in this area.

Thursday, April 02, 2009

new flip book beta

From Peter Brantley on the OCA blog -- A new beta version of the Flipbook bookreader has been released open source under GNU license. The source code is available from the Open Library site.

Wednesday, April 01, 2009

LC/CLIR report on pre-1972 sound recording copyright

Excerpted from the press release:

Sound recordings were not protected by federal copyright law until 1972. A Library of Congress report indicates that the miscellany of state laws protecting pre-1972 sound recordings will extend copyright protection until 2067, creating a situation where some recordings dating to the 19th century are not available in public domain.

The Library announced today the completion of a commissioned report that examines copyright issues associated with unpublished sound recordings. This new report from the Library of Congress and the Council on Library and Information Resources addresses the question of what libraries and archives are legally empowered to do, under current laws, to preserve and make accessible for research their holdings of unpublished sound recordings made before 1972.

The report, "Copyright and Related Issues Relevant to Digital Preservation and Dissemination of Unpublished Pre-1972 Sound Recordings by Libraries and Archives’ is one of a series of studies undertaken by the National Recording Preservation Board (NRPB), under the auspices of the Library of Congress. It was written by June Besek, executive director of the Kernochan Center for Law, Media and the Arts at Columbia University. The report is available free of charge at

Friday, March 27, 2009

New LC multimedia collection sharing initiatives

This is news ... The Library of Congress will begin sharing content from its vast video and audio collections on the YouTube and Apple iTunes web services as part of a continuing initiative to make its incomparable treasures more widely accessible to a broad audience. The new Library of Congress channels on each of the popular services will launch within the next few weeks.


The General Services Administration today also announced agreements with Flickr, YouTube, Vimeo and that will allow other federal agencies to participate in new media while meeting legal requirements and the unique needs of government. GSA plans to negotiate agreements with other providers, and the Library will explore these new media services when they are appropriate to its mission and as resources permit.

Read the Press Release.

Tuesday, March 24, 2009

Jenny Holzer

Even though I have already posted for Ada Lovelace Day, an exchange I had with a colleague earlier today led me to want to post about someone else. I accidentally printed the entirety of a lengthy PowerPoint presentation. After the pages I actually needed printed, I canceled the print job and went to my meeting. When I got back there was a stack of messed up printouts from the failure of the print job to, well, fail gracefully. There were pages of random letters in random length rows. A colleague saw me staring at one of the pages and exclaimed "text art!" I immediately thought of Jenny Holzer.

Jenny Holzer is famous for her text-based art featuring short statements, or "truisms." Some are well known cliches while others are random phrases or slogans or exerpted phrases from larger texts or documents. Her work explores the use of words and ideas in public spaces. She works in a variety of media, including large scale xenon projections, LED signs, the Internet, plaques, benches, stickers, T-shirts, and street posters. I cannot begin to describe how mesmerizing her work is, whether a large-scale projection or an immersive gallery space. For over thirty years she has joined ideology with space through text using technology. I just found out that she is on twitter.

I cannot remember where I first encountered her work. It may have been at SFMoMA. It may have been at the DeCordova Museum in Lincoln, Massachusetts (one of my absolute favorite museums that not enough people know about). Or it might have been at Mass MoCA. I very much want to see the exhibition of her work at the Whitney, on display through May 31, 2009. There's a detailed review in the New York Times.

Ada Lovelace Day

I have had the pleasure in my life of working with a number of strong (and strong-willed) women who have seen me through various stages of my career. On the occasion of Ada Lovelace Day, I'd like to write about a colleague who I have known for many years, although we only had the opportunity to work in the same place for 4 weeks: Caroline Arms.

Caroline joined the Library of Congress in 1995 to work on the American Memory project. While the initial focus of the project was digitization and access, she saw the underlying issue that was created by such an effort: preservation. There was a profound lack of awareness in the library world about digital preservation at the time.

Caroline thought long and hard about the life cycle of digital objects, focusing in particular on one of the most vital areas that have consequences for all preservation efforts: standards for metadata and file formats. Preservation is always easier if good choices are made about digital formats. Curators should make collection decisions knowing which formats will and won’t be easily sustainable. For an object to be useful long into the future, its formats should be carefully selected and the specifications and characteristics of its formats must be documented.

Caroline and LC colleague Carl Fleischhauer’s exhaustive format research led to their creation of the Digital Formats web site, the first definitive inventory of information about current and emerging digital formats. The site is an essential resource for the international digital preservation community. Caroline also made a concerted effort to promote the use of formats with open standards, and to shepherd file formats through the standards review process.

She was also involved with the development of the Open Archives Initiative Protocol for Metadata Harvesting. I first met Caroline working on a collaborative OAI harvesting project, and I owe much of my expertise to her mentoring.

It was a great loss to LC that Caroline retired in June 2008. She did not retire from the community, however, and is participating in a LC group looking at metadata even now.

Thank you, Caroline.

Friday, March 20, 2009


I have been suffering through a state of ennui of late. Low energy, not feeling like cooking, a short attention span for reading, lack of interest in TV shows I usually enjoy, and a strong desire to work at home, curled up on the sofa with cats. Not even getting a great deal on some fabulous shoes to wear once sandal season returns, or going to a farmer's market we'd never visited before and finding -- wonder of wonders -- bacon sage ravioli, has cheered me up much. Tasks that usually give me a strange sense of accomplishment, like a successful presentation for a group at work or being caught up on the laundry or finally depositing a stack of checks for small denominations that I kept accumulating to take to the bank all at once, have done little to enhance my mood for long.

It's partly the cold, March, why-isn't-it-really-spring-yet doldrums. It's just past the one year anniversary of putting our house in Charlottesville on the market. I have also spent the lion's share of my time doing almost nothing but writing. In the past few weeks I have written a chapter for a book, revised a conference paper, written two conference proposals, and written 3 lengthy technical documents for one project alone, not to mention sending countless emails. All that writing and spreading myself across Twitter, Facebook, and this blog has had the effect of cutting down on posting overall.

Successfully (I hope) completing my first Japanese course next week will help. And two projects I've been working on have launches next month -- getting those out the door and having the chance to talk about them them will almost certainly improve my outlook.

Sunday, March 01, 2009

copyright registries

I attended a great presentation by Siva Vaidhyanathan and James Grimmelmann at Georgetown University last Friday on the Google Book Search settlement. The question that I most wanted to raise during the discussion period (why did the facilitator never call on me?) was about their opinions on the proposed registry. This seems to me to be one of the topics most in need of clarification in the settlement.

I chatted with both of them afterwards. I worry about a potential lack of transparency of the registry's contents and its mode of operation. I have heard Dan Clancy from Google say that it will not be made fully publicly available.

While there a student from the University of Michigan School of Information mentioned Michigan's IMLS grant supported effort to create a Copyright Review Management System to increase the reliability of copyright status determinations of books published in the United States from 1923 to 1963. Last week Lorcan Dempsey was blogging about the OCLC Copyright Registry Evidence Initiative. Stanford has a Copyright Renewal Database. John Mark Ockerbloom at the University of Pennsylvania researched periodicals renewals in addition to posting scans from many volunteer institutions (including Carnegie Mellon's and Project Gutenberg's extensive work) in his Catalog of Copyright Entries. The U.S. Copyright Office has records from 1978 onward online.

So, where does a Library (or anyone, for that matter?) go to research the copyright status of a published work? One of these places? All of these places? And where might the ownership status of orphan works someday be researched and recorded and made public? What will be the most authoritative source? Will there be open resources and less open resources? This looks like an area where there might be too much competition, almost a splintering of attention that calls out for a sense of coordination in the community.

recent reading

Some reports and posts that caught my attention recently:

The Andrew W. Mellon Foundation released a progress report from the DuraSpace project, a joint project of the DSpace Foundation and the Fedora Commons.

"MetaTools - Investigating Metadata Generation Tools" from JISC.

Merrilee Proffitt from RLG/OCLC posted on the "Legal and Ethical Implications of Large-Scale Digitization of Manuscript Collections" symposium at UNC-Chapel Hill. Posting Part 1 and Posting Part 2.

Andrew Richard Albanese published an article for Library Journal called "Institutional Repositories: Thinking Beyond the Box." It's a very balanced presentation of a number of points of view on the failure and success of IRs.


Via TeleRead, I found an essay about eReading devices by Jennifer Chapelle on treocentral. The piece, "Centro, iPhone, and that Other Reading Device (Kindle 2)," briefly describes her experiences with a Centro and an iPhone, focusing on the new Kindle 2.

Overall, she liked it. But she's not throwing away her other devices.

If you've ever been interested in getting an eReader type of device, I can definitely recommend the Kindle 2. It's not the cheapest gadget, but it does have a lot of features, and don't forget that 3G Sprint radio inside. If you want an eReader that is thin, lightweight, fast, looks great, has a built-in dictionary and a battery saving sleep-mode with some cool portraits, the Kindle 2 from Amazon is a great choice.

And if you don't care about those eReaders like the Kindle and the Sony device, just stick with your Treo or Centro. Those are great little eBook readers! And we know all the other great stuff you can do on them like talking on the phone, texting, writing documents, listening to music, taking photos, surfing the internet on decent looking web browsers, playing games, etc. My Centro and Treo Pro will be staying right by my side, Kindle or no Kindle.

I saw an interview with Jeff Bezos on Charlie Rose last week, which was primarily a discussion of the Kindle 2. My take-away is that the killer feature for the Kindle is the wireless purchasing of books that does not require a PC. Bezos is also a huge fan of the ability to bookmark your location in a text on your Kindle, and when you pick up another of your Kindles, the devices will sync up and you will find the same bookmark. Interesting, but I'm not sure I understand yet why you would have more than one. One at home and one at work? One downstairs and one upstairs? It's already portable. The functionality that they are working on where you can sync between your Kindle and a reader app on a cell phone and back interests me more. His example was reading on a cell phone while waiting in line at the grocery store, and having your Kindle aware of your new bookmark once you get home. That use case works better for me.

His statement that he wants to deliver "Every book ever in print in any language" gives me pause. That feels potentially monopolistic for the eBook distribution sector. Well, at least for their proprietery AZW ebooks. But if theirs becomes the most successful pipeline for eBooks, will other creators and distributors of other formats be able to compete? I can only assume the open access eBook realm will not fade away.

I found myself looking at the Sony eReader a week ago. The touchscreen and non-touchscreen versions boths have some different usability issues. The touchscreen is the better of the two, and supports annotation. It supports more files formats that the Kindle. It requires a PC has no wireless features. And it runs on MonteVista Linux, which a member of my family worked on a couple of years ago.

For now at least I plan to continue to read books on my Centro. I have about 3 dozen books, some recent, some classics. And I haven't divested myself of my nearly 3,000 dead tree books. Or my library cards.

Sunday, February 22, 2009

Caldwell collection

The Cooper-Hewitt Library is celebrating the release of Shedding Light on New York: Edward F. Caldwell Collection. The collection contains more than 50,000 images consisting of approximately 37,000 black & white photographs and 13,000 original design drawings of lighting fixtures and other fine metal objects that they produced from the late 19th to the mid-20th centuries.

Caldwell & Co. was America’s premier producer of lighting and other metal objects during the turn of the 20th century through the 1940s, and the archives are currently stored in the Cooper-Hewitt National Design Museum Library in New York City. Notable clients of Caldwell lighting fixtures included the Rockefellers, the Carnegies, and the Roosevelts, and the company was also commissioned for famous landmarks such as the Grand Central Terminal, Radio City Music Hall, and the Waldorf-Astoria in New York City. Caldwell & Co. manufactured unique and intricate lighting fixtures in their Manhattan factory, such as chandeliers, electrified lamps and wall scones, which were then shipped to prominent residences all over the United States.
New York Public Library also has Caldwell & Co records.

Saturday, February 21, 2009

Catalogue of Digitized Medieval Manuscripts

A team at UCLA has launched the Catalogue of Digitized Medieval Manuscripts, a centralized online archive of holdings worldwide.

The Catalogue first began to take form in Christopher Baswell's talk at the MLA conference in December, 2005. Generous support by the Center for Medieval and Renaissance Studies at the University of California, Los Angeles, has enabled Professors Matthew Fisher and Christopher Baswell to develop this site, and make it publicly available in its current form through the CMRS web site. An additional grant from the UCHRI (University of California Humanities Research Institute) made possible additional data entry, and substantive refinements to the back-end technologies in place.
Eventually, the site will have a collaborative layer of some sort, so that scholars can share their expertise with other researchers and with libraries, which do not always have the most accurate information for each manuscript, according to Mr. Fisher. He’d like the catalog to provide a general set of digital tools, too, so that similar databases can be built in other fields.
To date the project has located over 5,000 digitized manuscripts, and over 1,o00 have been cataloged for inclusion. An article in the Chronicle of Higher Ed provides background on the project.

Friday, February 20, 2009

FDsys federal content management system

Via Open Access News, the FDsys (Federal Digital System) of the US Government Printing Office (GPO) has entered its public beta. FDsys is an advanced digital system that will enable GPO to manage Government information in a digital form, and enable GPO to manage information from all three branches of the U.S. Government.

For more detail, see Joab Jackson's article about it in Government Computer News, February 5, 2009. There are five major releases planned over the next three years.

Duke Library's Trident Metadata Tool

The Duke University Library is blogging about its Trident Metadata Tool development. Their February 13 post is the first on their architecture.

good week for open source releases

The Indiana University Library has released open source software to create a digital music library system.

Indiana University today announces the release of open source software to create a digital music library system. The software, called Variations, provides online access to streaming audio and scanned score images in support of teaching, learning, and research.

Variations enables institutions such as college and university libraries and music schools to digitize audio and score materials from their own collections, provide those materials to their students and faculty in an interactive online environment, and respect intellectual property rights.

A key feature of the system for faculty and students is the ability to create bookmarks and playlists for use in studying or in preparing classroom presentations, allowing easy access later on to specific audio time points or segments. A key feature for libraries is a flexible access control and authentication system, which allows libraries to set up access rules based on their own local institutional policies.

This software is the culmination of nearly fifteen years of development and use of digital music library systems at Indiana University. Creation of the current Variations software platform was originally funded by the National Science Foundation. In 2005, the Institute of Museum and Library Services awarded Indiana University a National Leadership Grant to extend this highly successful system to the nationwide library community. Beyond IU, the software is currently being used at the Ohio State University, University of Maryland, New England Conservatory of Music, and the Philadelphia area Tri-College Consortium (Haverford, Swarthmore, and Bryn Mawr).

This open source release of Variations complements IU’s earlier release of the open source Variations Audio Timeliner, which lets users identify relationships in passages of music, annotate their findings, and play back the results with simple point-and-click navigation. This tool is also included as a feature of the complete Variations system.

Indiana University plans to offer a free one-hour Variations webinar at 4:00 PM EST on March 4, 2009 for institutions and individuals interested in learning more about the system. To register, e-mail

The Indiana University Digital Library Program created Variations in collaboration with faculty and students in IU’s Jacobs School of Music. The IU Digital Library Program is a collaborative effort of the Indiana University Libraries and the Indiana University Office of the Vice President for Information Technology.

For more information on the Variations open source release, see:

The Washington Times released some Django open source tools (has a newspaper even released open source software before?):

The Washington Times has always focused on content. After careful review, we determined that the best way to have the top tools to produce and publish that content is to release the source code of our in-house tools and encourage collaboration.

The source code is released under the permissive Apache License, version 2.0. The initial tools released are:

  • django-projectmgr, a source code repository manager and issue tracking application. It allows threaded discussion of bugs and features, separation of bugs, features and tasks and easy creation of source code repositories for either public or private consumption.

  • django-supertagging, an interface to the Open Calais service for semantic markup.

  • django-massmedia, a multi-media management application. It can create galleries with multiple media types within, allows mass uploads with an archive file, and has a plugin for fckeditor for embedding the objects from a rich text editor.

  • django-clickpass, an interface to the OpenID service that allows users to create an account with a Google, Yahoo!, Facebook, Hotmail or AIM account.

The web site will be hosting the code and issue tracking software, using django-projectmgr.

Sunday, February 08, 2009

Yiddish books online

In October 2008 at an Open Content Alliance meeting, I saw a presentation about the National Yiddish Book Center. It has just been announced that over ten thousand Yiddish texts -- estimated as over half of all the published works in Yiddish currently in existence -- are now available online through a joint venture with the Internet Archive. From the press release:

The National Yiddish Book Center is proud to offer online access to the full texts of nearly 11,000 out-of-print Yiddish titles. You can browse, read, download or print any or all of these books, free of charge. These titles were scanned under the auspices of our Steven Spielberg Digital Yiddish Library, and have been made available online through the Internet Archive.

Original, used copies and new, print-on-demand hardcover reprints of most titles in our collection are available at nominal cost.

Some of rights issues are apparently unclear, but it seems so important to make this collection available -- works written in an at-risk language that were at one point systematically destroyed -- that any potential legal risk is worthwhile in my mind.

A brief announcement appeared in the New York Times.

Thursday, February 05, 2009

wikipedia loves art

The Smithsonian American Art Museum has announced its participation is a really interesting initiative -- help illustrate Wikipedia articles with your images of art from their collection. Photograph items from the collection, following some guidelines, upload your images to flickr, and your images will likely be used to illustrate a Wikipedia article.

Over the next month we are participating in Wikipedia Loves Art, a scavenger hunt and free content photography contest among 15 museums and cultural institutions worldwide. The project, in conjunction with Flickr, is aimed at illustrating Wikipedia articles. The event is planned to run for the whole month of February 2009.

We're inviting you to come into the museum and shoot photos of our artworks based on various themes. You can shoot on your own or form a small team (10 people, tops). The photogs or teams with the most points will win prizes.

The details about participation are available at Wikipedia.

This is part of the larger Wikipedia Loves Art project where a number of museums are participating in a scavenger hunt. The only thing that is not clear to me is what the Wikipedia articles that these images will illustrate are about. Scholarly topics? Topics related to art and art history? Articles about the museums or the specific works of art? I am curious.

DCC paper on interoperability

The Digital Curation Centre has released a short briefing paper on interoperability. Its a good, brief primer on the basic issues.

JHOVE2 requirements available

The latest version of the JHOVE2 Functional Requirements have been posted. I'm still interested in what isn't documented yet, e.g., the final list of formats that will be supported.

Sunday, January 25, 2009

National Film Board of Canada puts archives online

The National Film Board of Canada (NFB) has opened up its archives - more than 500 films, clips and trailers are now available on their new Screening Room web site. They're freely available for online viewing (there are costs for public broadcast and educational use), with more to be added regularly.

the burden of twitter

Steven Levy has written an essay for Wired about the guilt that one can feel for not participating enough in ones social network. Following tweets but not twittering, not blogging often enough, or not updating ones Facebook status. It's a brief but interesting read on privacy and a weird sense of duty to keep those public lines of communication open.

Nicholas Carr has posted a very interesting reaction to Levy's essay.

There's an arrogance to sharing the details of one's life in public with strangers - it's the arrogance of power, the assumption that such details somehow deserve to be broadly aired. And as for the people, those strangers, on the receiving end of the disclosures, they suffer, through their desire to hear the details, to hungrily listen in, a kind of debasement. At the risk of going too far, I'd argue that there's a certain sadomasochistic quality to the exchange (it's a variation on the exchange that takes place between celebrity and fan). And I'm pretty sure that Levy's remorse comes from his realization, conscious or not, that he is, in a very subtle but nonetheless real way, displaying an undeserved and unappetizing arrogance while also contributing to the debasement of others.
This seems a bit strong to me, but not entirely off base. Arrogance of power? Debasement? Sadomasochistic? OK, that may be true for some who participate in social networking, just the same as for some participants in a real life communities. There is something a bit egotistical in assuming that others will follow your tweets/blog/delicious tags/flickr set/facebook. There is something a bit creepy that, if you don't require approval, complete strangers read your tweets where you might be discussing where you are at any given time. I like to think that most use social networking to actually keep in touch, not to obsessively stalk one another.

There's that public sharing expectations thing again. I know, I think about this a lot. People I do not know read my blog, see many (but not all) of my flickr images, and join my delicious network to see most (but again, not all) of my bookmarks. I have made a conscious decision to share these things. I had to struggle with getting over the creepiness factor. It was well over a decade ago that a woman from China, upon being introduced to me at a conference reception, exclaimed "Oh! I know who you are -- you have an interest in folk art and you like armadillos!" She had come across my personal web page (remember those?) while researching the conference speakers.

There's no turning back. There's only self-selecting your level of exposure.

Folger Library launches online image collections

The Folger Shakespeare Library just expanded access to its Digital Image Collection by offering over 20,000 images online. The collection includes books, theater memorabilia, manuscripts, art, and 218 of the Folger’s pre-1640 quarto editions of the works of William Shakespeare.

Online use is through the Luna Insight Browser -- you have to add an exception to your popup blocker or the software will not function properly. To access their Shakespeare Quartos collection and to get full functionality (saving searches, exporting html pages) you have to install the free Insight Java client.

They have a "how-to" page and search tips available.

Library of Congress SourceForge release

Last month the Library of Congress had a soft launch of an open source software release. We officially announced the release in the January 2009 issue of the Library of Congress Digital Preservation
. This is the first software that the Library has formally released as open source.

The tools are available through SourceForge under the “Library of Congress Transfer Tools” project. The project includes tools for use with BagIt specification, a hierarchical file packaging format for the exchange of digital content jointly developed by the Library of Congress and the California Digital Library.

Three tools developed by the Library's Repository Development Group are available now. Parallel Retriever implements a simple Python-based wrapper around wget and rsync to optimize the transfer of content between locations through parallelization. It supports rsync, HTTP, and FTP transfers. Bag Validator is a Python script that validates a Bag, checking for missing files, extra files, and duplicate files. VerifyIt is a shell script that verifies file checksums within a Bag manifest using parallel processes.

The Library plans to release additional tools as part of a suite of solutions and software development resources as they are completed over time. There are already more tools in the pipeline.

Friday, January 23, 2009

mobile is the new black

There's a new WorldCat Mobile pilot service.

NYPL has announced its NYPL Mobile beta.

The DC Public Library launched an iPhone app.

Stanford has a new version of an iStanford iPhone app that ties into its student services system.

The International Children's Digital Library launched an iPhone app last November.

technology transition at the white house

There was a great Washington Post article yesterday about how White House technology is "in the Dark Ages." I laughed bemusedly over my toast and read the article aloud at the breakfast table.

The White House is not being singled out. I work for a Federal Agency. I know folks who work at numerous other Federal Agencies, some of whom have worked at said agencies for decades. Federal agencies have many, many rules about hardware and software security, and every agency has to interpret and enforce those rules themselves. Security levels of content muddy the waters. This can cause a certain amount of confusion as to what is and isn't allowed. Someone told me that their agency (not the White House) hasn't yet approved Firefox. News that the White House counsel's office approved use of Gmail accounts for some press office activities has been forwarded to many Federal IT units, I'm sure.

Edit, 25 January: Wired has posted a Wired/Tired overview of White House tech, and a list of recent technology projects from various agencies. Nice to see the shout out for the LoC Flickr project.

Thursday, January 15, 2009

oclc summary of proposed google book settlement

Ricky Erway from OCLC has distilled the proposed Google Book Settlement, its appendices, and the three library registry agreements from 320 pages to a 4 1/2 page summary. It's an excellent overview of the proposal.

d-lib article on some LC tool development

My colleague Justin Littman has just published an excellent article in the January/February 2009 issue of D-Lib Magazine: "A Set of Transfer-Related Services."

"The Office of Strategic Initiative's (OSI) Repository Development Team (RDT) is developing a portfolio of services and components to address the challenges posed by scaling transfer processes. While the portfolio is expanding, the focus of this article will be on two core services, the Inventory Service and the Workflow Service. Before proceeding to examine these services, it will be useful to further delineate the transfer problem space. After examining these services, their role in mitigating preservation risks will be considered."

Monday, January 12, 2009

presidential records and donation reform

On January 7, 2009, the U.S. House of Representatives approved H.R. 35, the "Presidential Records Act Amendments of 2009," and H.R. 36, the "Presidential Library Donation Reform Act of 2009." These were chosen by the House leadership as the first pieces of substantive legislation passed in 2009 as a symbol of government transparency.

The Presidential Records Act Amendments restores meaningful public access to presidential records by nullifying a 2001 Bush executive order, and the Presidential Library Donation Reform Act requires the disclosure of big donors to presidential libraries. The Senate still has to pass its versions of the bills before they can go to soon-to-be-President Obama to be signed, which he has apparently indicated that he would.

The National Coalition for History provides a good overview of the Records Reform Act. The House Speaker's site provides an overview of both.

Sunday, January 11, 2009

I want a Palm Pre

Pretty much everyone who knows me knows my loyalty to Palm. I've had one since 1997. I've been syncing it with enterprise calendar systems so I have my personal and work calendars going back to February 1996. I currently have a Centro, and even though I have to manually key in all my work events because we use such an old version of Groupwise that I can't seem to find a sync that works, I am still devoted to my Palm.

I cannot count the number of friends and colleagues who have iPhones and have done their best to convert me. The Urbanspoon app and its clever use of the accelerometer to randomize recommendations by shaking the phone almost had me. My answer is always that if they can promise me that I can port over everything I have on my Centro -- 13 years of calendar, hundreds of contacts, notes, to-do lists, and ebooks -- then I'll consider it.

I am now waiting with baited breath for the Palm Pre. It's such a step forward in the interface (a card stack metaphor) and operating system (its WebOS is Linux) and browser (based on WebKit). It doesn't use the Palm desktop anymore, which while be a paradigm switch for me, but they have promised data migration tools for Centro users.

I know that some apps I use won't work anymore, and Raymond's concern about no backwards compatibility of the OS is a real one. But they need to move on and I need to move on, and I'm glad it looks like it will be to another Palm. You can't always be fully backwards compatible. Hey, I only complained a little when I discovered I couldn't run FileMake Mobile on my Centro, didn't I?

Reviews at Gizmodo, ars technica, PC World, and a PC World FAQ.

Saturday, January 03, 2009

on electronic texts

I just read an article at Information Today by Nicholas Tomaiuolo, an instruction librarian at Central Connecticut State University, entitled "U-Content: Project Gutenberg, Me, and You." He outlines the requirements and steps for preparing an etext for Project Gutenberg.

At one point in the article, there is a discussion about the requirements for full text, not just a PDF created from page images. The author wrote this from the point of one unfamiliar with PG's requirements, illustrating the process one might follow to create an acceptable PG submission -- images to PDF, and images to OCR to corrected plain text -- I found myself thinking quite a bit about the often heard statement (not in this article, mind you) that PDF is the ultimate format for texts.

I'm in no way denigrating PDF. PDFs is an absolutely required format for texts. PDF is highly portable and shareable and readable, and, if the source files are good enough, clearly printable. But it's not innately analyzable or easily repurposed. That requires full text.

I am not unfamiliar with what it takes to create an accurate plain text transcription of a text. When Gutenberg was in its early days, we were really talking about transcriptions, as in people typing in text. OCR has greatly streamlined that process, but the proofreading required is non-trivial. Want to work with a highly formatted text, or one with tables or formulae or figures? Challenging. Adding layers of structural and semantic markup to plain text, as with TEI, is time consuming. Rich markup, including identifying dates or names or geographical places, or providing normalized versions of said dates and names is a large undertaking. A full text with structural and sematic markup can be repurposed into many formats, including ebooks and PDF.

And you do want ebooks. Some months ago I had the great opportunity to demonstrate the prototype World Digital Library site at the National Book Festival. There is no greater focus group than thousands of people who love to read! The two top requests were that the books should be downloadable as ebooks and that all the text content be available as full text in all seven project languages. These were not academics or librarians (although there were some of the former and many of the latter who stopped by), but parents and commuters and researchers and genealogists.

Both are daunting requests when you do not have full text available to work from. There will be PDFs. The others are goals to strive for.

Friday, January 02, 2009

flickr commons developments

I am not part of the Library of Congress Flickr project team, and I in no way speak for them.

There is a lengthy discussion on Wired and at Found History about The Commons on Flickr, given that Yahoo laid off key staff member George Oates who shepherded the project. It doesn't seem that the project is in any imminent danger, but a dedicated group has stepped forward to evangelize, innovate, and curate thematic sets from the across the collection. I'm thrilled to see a community of use developing around The Commons, but it's sad that this is what precipitated its full coalescence.

DCC obsolescent data and files challenge

There's still time to send a message to Chris Rusbridge at the Digital Curation Center to enter his personal data recovery challenge:

I will do my best to recover the first half dozen interesting files that I’m told about… of course, what I really mean is that I’ll try and get the community to help recover the data. That’s you!

OK, I define interesting, and it won’t necessarily be clear in advance. The first one of a kind might be interesting, the second one would not. Data from some application common of its time may be more interesting than something hand-coded by you, for you. Data might be more interesting (to me) than text. Something quite simple locked onto strange obsolete media might be interesting, but then again it might be so intractable it stops being interesting. We may even pay someone to extract files from your media, if it’s sufficiently interesting (and if we can find someone equipped to do it).

The only reference for this sort of activity that I know of is (Ross & Gow, 1999, see below), commissioned by the Digital Archiving Working Group.

What about the small print? Well, this is a bit of fun with a learning outcome, but I can’t accept liability for what happens. You have to send me your data, of course, and you are going to have to accept the risk that it might all go wrong. If it’s your only copy, and you don’t (or can’t) take a copy, it might get lost or destroyed in the process. You’ll need to accept that risk; if you don't like it, don't send it. I might not be able to recover anything at all, for many reasons. I’ll send you back any data I can recover, but can’t guarantee to send back any media.

The point of this is to tell the stories of recovering data, so don’t send me anything if you don’t want the story told. I don’t mind keeping your identity private (in fact good practice says that’s the default, although I will ask you if you mind being identified). You can ask for your data to be kept private, but if possible I’d like the right to publish extracts of the data, to illustrate the story.
His deadline is Twelfth Night, January 5, 2009. Read his post for the rest of the details.

I've been thinking a lot lately about data migration and recovery, and I think this is a great way to illustrate the challenges and solutions -- with real data and the stories behind its creation, loss, and potential recovery.


I am definitely a proponent of the self-made. Cooking, clothing, jewelry, assisted forays into small electronic projects, etc. At various times I have designed and made costumes, painted, made paper, etched prints, made lampwork glass beads, and learned the basics of Chinese brush painting. I haven't had as much time for that in the last couple of years, but that doesn't mean that I don't still strive to make.

I'm a big fan of Make, and so are a lot of others. In that vein, William Turkel has posted his list of books for humanist makers.

interview with Vint Cerf

Siva Vaidhyanathan has posted his interview with Vint Cerf about developments in search technology and Google.

Thursday, January 01, 2009


I don't know if these are resolutions or goals ...

I have been getting back to writing in the past few weeks, but I will write more about what we're working on and getting the word out.

I will be a more hard-nosed project manager vis-a-vis deadlines. I am too often too understanding of delays.

I will explore Washington D.C. more. I lived in metro D.C. for part of my childhood and have been in Virginia for 7 years, but still have a disconnected sense of the city. Perhaps that's in part an artifact of traveling by Metro and not having a real sense of the city's topography.

I will meet more people in other divisions at LoC. This is harder than it sounds.

I will eat lunch at my desk less often. I will make more of an active effort to organize social opportunities. (Why don't I remember this every time I think about switching jobs -- it gets harder and harder to build a new social network every time.)

I have faith that our house in Charlottesville will finally sell.

EDIT: There is another long term goal that I'm already working on: I've signed up for a Japanese class through the Federal employee graduate school. This is the year to begin transforming my ragtag knowledge of Japanese into usable language skills.