Wednesday, December 27, 2006

5 things you don't know about me

I don't often do memes, but ...
1. My first library job was in the 4th and 5th grades. I had to pass a test on alphabetization and the top level Dewey Decimal classes to work as a shelver and at the circ desk at my elementary school library.
2. During my freshman and sophomore years of college I worked at a Baskin Robbins in Los Angeles. Among my duties were cake decorating and making ice cream cakes and pies. I could still make a grasshopper pie if asked.
3. The previous item wouldn't be peculiar if it weren't for the fact that I'm lactose intolerant.
4. I trained as an archaeologist in graduate school and worked for museums for many years before moving into Library work. All my work was focused on digital collections and automation so my transition to digital libraries is not so odd. My first job while in graduate school was a recon project to transcribe written acquisitions records into a database, and to create digital images of a major Moche pottery collection. In that previous life I also spent seven years on the board of the Museum Computer Network.
5. I collect Mexican folk art. Not in a systematic way, but when I see things that I really like it's hard to stop myself from buying them. I'm looking forward to hitting some galleries while in San Antonio for the Open Repositories 2007 conference.

Thursday, December 21, 2006

new for uva dl

For years I've been forwarding notices on articles, reports, sites, and what-not to to various lists and groups at the Library. My colleague Cyril pointed out the obvious to me the other day when he and Ronda was presenting a session for library staff on -- that while the email messages were useful, an annotated and tagged set would be even more useful (and more persistent than messages in folks' inboxes).

Given that it's the week before Christmas and work was winding down for the break, I went through three years of outgoing messages to particular internal email lists (yes, I'm a compulsive email hoarder) and created a set:

I have a lot more to add, it needs some work on the tags, and I haven't created any bundles or set up any networks yet, but it's a start at pulling together things that I find useful and think my colleagues should know about.

Monday, December 18, 2006


I've spent way too much of my day exploring OCLC's FictionFinder prototype. Read more about the project.

The subject tag cloud that you encounter when first entering the system intrigues me -- the most commonly used subject appears to be "Marriage." I followed the subject "Missing children," vaguely thinking that I might encounter From the Mixed-up Files of Mrs. Basil E. Frankweiler. Nope. I searched for it and found that it's actually the subject "Runaway children." I wonder if I could have found it without knowing the title or that subject term? How do you know what subject term is the right one when browsing or searching? When is something under "Quakers" and when is it under "Society of Friends"?

I think that defaulting to genre for browsing is the right choice. It's a manageable length for browsing (at least for now), while the "subjects" list is quite long and "characters" is huge. See Thom Hickey's post on the difficulty in creating that character list. The awards list frustrated me momentarily -- I had to remember that the "Edgar" awards are actually the Mystery Writers of America awards and look under M.

The "settings" browse list results could be frustrating for some. I clicked on Mexico and found books where the subject was actually "New Mexico." It was great to see that books where the subject was "New Mexico -- Santa Fe" showed up under "New Mexico."

I searched on "voodoo," which is not the official subject terms (It's voodooism, if you care to know). I got books where voodooism is a subject. I got books where voodoo is in the title. And I got books where voodoo is in the description, such as "By the author of Voodoo, Ltd." I know it's a tough problem to index the assigned terms and other fields where relevant subject topics might be found.

The FRBRization display for a work is a sensible one. Who knew that Gaston Laroux's Fantome de l'Opera was available in Thai? Nice to see that I could follow the edition into WorldCat to see that Cornell owns it.

I did come across many examples of works that should have been identified as the same but were not for a single author -- H. P. Lovecraft. When I had the same experience in LibraryThing some months ago, I spent some time cleaning up the work relationships. I wonder why his works are difficult to identify and combine programmatically?

How do I combine browse types? I'm looking for books set in England that feature ghosts. There's an advanced search but no advanced browse. Maybe something like at Amazon, where one can narrow within facets: jewelery --> rings --> gold --> emerald.

I don't want this to sound like a rant, because it isn't. I think this is a really promising prototype, both as a FRBR experiment and as a subject browse environment. The fact that you can generate the browse lists at all is exciting.

current issue of D-Lib

The December issue of D-Lib has two articles in particular that I found very worth my time.

The first is David Bearman's review of Jean-Noël Jeanneney's Google and the Myth of Universal Knowledge: A View from Europe. I've known David almost twenty years and I always find his issue pieces thoughtful.

The second is a very interesting article on the proposed draft audit checklist for repositories and OAIS. The Audit Checklist is still a draft after maybe 2 years. The outcome presented in this article that even after annotating the checklist for use in an NDIIPP project, there were still issues in scoring the results and interpreting them.

We began this process by annotating the Audit Checklist and enlisting our team members to gauge their software installation experiences against it. Currently we are concluding a series of meetings to reach a consensus on the interpretation of checklist items. Using a test example scenario, we also experimented with applying an existing scoring instrument to the annotated Audit Checklist. This was an exercise that clarified the need for a more meticulous refinement of our annotated Audit Checklist, one that should be undertaken with the developers of the common repository software applications. Our experience thus far suggests that the application of 'weights' to the Audit Checklist items, specifically according to an institution's own needs and priorities, may also provide a framework for guiding a reiterative self-assessment process of an institution's repository services. Aside from this, as more institutions explore the possibility of providing trustworthy digital repository services, the evaluation of repository software applications increasingly will necessitate a more extensive, community-based expression of technical functional specifications needed to support the requirements of Trusted Digital Repositories. With an ever increasing array of potential software tools, services, and infrastructure configurations, the time is ripe for an evaluative approach to repository software that considers the array of items found in the Audit Checklist.
Array is right. The checklist has 86 items and four possible scores for each. This instrument is challenging to use and exceptionally experienced and well-qualified people still have issues in agreeing how to score it. Such a tool is definitely needed -- why is it so hard to design one?

Thursday, November 30, 2006

project management software

Yesterday I was commiserating with a colleague about the complexities of MS Project, and how it was overkill for what we often needed -- tracking of a small number of tasks, the people assigned to them, deadlines, and a comprehensible dashboard type of report.

Today, I have seen a potential solution, and its name is dotProject.

For all I know I'm the last person on the planet to know about this, but another colleague just introduced me to dotProject, and it is highly intuitive to use. Create a project, add tasks, create task parameters, create reports. It's web-based and highly shareable with a team, and group editable.

It was demo'ed for our sys admin this morning, and he's agreed to a test install for us. He also found it super easy to use and was particularly impressed by its dashboard reporting features.

Check it out at System requirements are at

Wednesday, November 15, 2006

children's book week

Not too surprisingly, I was a constant reader as a child. When it came time for Scholastic book sales at my school, I would pore over the little catalog and select dozens of books, which my mother would usually make me pare down to no more than a dozen per order. Even so, teachers would express amazement over my orders, asking "How long will it take you to read all these, dear?" Stunned silence would follow when I'd reply with a very small number of days. Hey, these were my teachers -- didn't they know how fast I read?

I read way beyond my grade level, reading Hawthorne and Poe and Lovecraft in elementary school. I remember a short story in an Alfred Hitchcock-edited collection that terrified me, and still likely would today. I bought every book of folklore and ghost stories. I read A. A. Milne, Lewis Carroll, L. Frank Baum, Roald Dahl, Madeline L'Engle, Andre Norton, and Maurice Sendak (Higgelty Piggelty Pop!). I loved the Alfred Hitchcock 3 Detectives books, Ruth Chew's Witch books, The Wonderful Flight to the Mushroom Planet, The Little Prince, The Mixed-up Files of Mrs. Basil E, Frankweiler, and The Phantom Tollbooth. There were some real oddities like The Forgotten Door and Stranger from the Depths.

If I could name _a_ favorite, it would be The Mixed-up Files of Mrs. Basil E, Frankweiler. I still want to live at The Met. The Phantom Tollbooth and Higgelty Piggelty Pop! tie for a close second.

I still buy children's books occasionally. Every so often I come across one that I just feel the need to buy, like Armadillo Rodeo or Frankie's Bau Wau Haus. I only read The Mouse and His Child two years ago.

Check out the "childrens" tag in my LibraryThing tag cloud. Sadly, my mother got rid of many of my books while I was in college. I still have some of them. A few I've replaced. I recently got a copy of a cookie baking book that I still think has the best recipe for snickerdoodles.

Children's Book Week

Tuesday, November 14, 2006

google news

It's now official -- the UVA Library is joining the Google book scanning initiative.

Here's the UVA press release:

And the identical Google press release:

We're very excited here. Still feeling a bit overwhelmed as we get ready to think about the scale of the process, but excited nonetheless. It's not yet set when we're starting or what materials we're sending. It's going to make quite a change in our local digitization efforts.

Tuesday, October 24, 2006

uses versus users

Something else came up in a number of discussions here recently that really struck me. We were talking about different types of users -- faculty, graduate students, undergraduates -- when the topic turned in a interesting direction. No user is a single type -- a faculty member might be ordering reserves one day and looking for a DVD to check out for the weekend on another day. A graduate student might a faculty member's proxy one day, doing their own research another day, or looking for beach reading in July. We all know this.

So, why don't we talk about uses rather than users?

Browsing. Searching. Research. Reserves. These cross many demographics. Let's analyze our services and interfaces from these standpoints, and not in terms of what "the faculty" or "the undergraduates" need. In other words, not a persona but a category of use. I don't have these ideas fully formed yet, but as someone who has spent a lot of time doing individual usability testing, I'm going to continue to give this some thought.

discussion of blogs and rss

I led the second in our "Not for Geeks Only" discussion series last week on blogs and rss -- 40 people signed up! We really seemed to have struck a nerve , and our Library staff seem really pleased to be introduced to these topics in a personal, targeted way rather than wandering through every random page on the web trying to find what's relevant.

It's not comprehensive, but covers some blogs that I and my colleagues here use every day to help us do our jobs.

Coming up in the future -- GoogleScholar and Google Book Search, flickr, tagging, and firefox extensions.

planning our future(s)

For the past few weeks we have been planning for a visit from some folks from SirsiDynix, our ILS vendor. A number of people prepared presentations on various topics -- ILL requests, circulation, acquisitions, cataloging, user expectations -- and presented them over a two-day period.

It was highly illuminating.

We have a lot of staff who are very engaged with how we might improve not only our business operation and transactions, but how we might improve the user experience for our community. I was particularly impressed by the presentation on user expectations, which incorporated many Library 2.0 ideas that make sense for us -- rss feeds, "did you mean?" search facilitation, faceted browse, personalization, and recommender systems. While I was forewarned, it was still a amusing to see the content of my blog entry of September 5 (and my picture) illustrating the first slide describing the ubiquitousness of digital services as informing user expectations. I was described as the Library's "ubergeek." I cannot deny that I like being thought of that way. It's similar to the time in a job long ago when we we looking to create a new job title for me, and my boss suggested that it could simply be "Maven."

Another presentation that impressed me was the one dealing with cataloging. In part it was about improving efficiencies in the software used, but much of it was about how to best create shareable metadata that can be used for many purposes and by many systems. A great discussion followed about context and repurposing metadata and how to deal with authority control across systems.

There was also an excellent discussion about how our systems require too much data duplication. Why do we need to create and maintain tables of course names and numbers for reserves when students services already maintains such a data source? Why do we need to create tables to track payments when our procurement office already does that? Why do we need out own authentication system when the university has its own?

There was also an interesting presentation on ILL requests, and crying out for better implementation of standards and protocols so our systems can better communicate and our staff don't have to do as much work manually as they do now. OCLC was mentioned many times.

If the presentations from that day are ever made publicly available, I'll post links.

Monday, October 02, 2006

it's all about the metadata

Last week I attended the NISO "Managing Electronic Collections" workshop. I spoke about our Digital Library Repository implementation, and was gratified to have a number of people ask me questions over the course of the two days that I was there. One question really struck me -- "What is the most important thing that you learned in your process that we should take into account in our project?"

It could almost be a one word answer: metadata.

Of course it's a more complex answer than that. What metadata do you need to capture? Technical, preservation, administrative, descriptive? In what format? What's the minimum? We have experimented a lot in this area, and there has been a certain amount of "lather, rinse, repeat" as we've refined our metadata. In some cases, encoding standards have changed so mappings had to change. Or workflow tools have changed, requiring review of what metadata we can automatically capture, and in what form. Or standards have developed, such as those for the preservation or rights, so we need to review what we're capturing.

One of the most significant change agents has been evolving end-user services. Why? Because you can't support functionality and services (and often usability) if the needed metadata isn't there, or is in the wrong form. Having an extensible architeture is vital. Identifying standards to be used, and having production workflows that can process appropriate content in a timely fashion is key. But really, it's all about the metadata.

Ex: We want to be able to support sorting and grouping of search results by creator or title, which is easier if there are pre-generated sort names and sort titles (doing it on the fly takes a lot of processor overhead).

Ex: We want to create aggregation objects that bring together multi-volume series or issues in a serial title, which is easier if you have the most complete enumeration possible and identify scope to as granular as level as possible (e.g., volume, issue, article).

Ex: We want to supported faceted subject navigation, which is easier if the subjects terms are broken out in a granular way from their post-coordinated forms, such as identifying geographic vs. topical vs. temporal parts in the subject.

Each of these requires a change to our DTD and/or the patterns of our encoding, and, sometimes requires us to regenerate the metadata from the originals sources. But each time we both better document the objects and improve the services and the interface that we provide, so it's worth it.

If you're interested in what we've delved into so far:

Thursday, September 21, 2006

Google Book Search

The Google Book Search now features "Find this book in a Library" links to WorldCat. It's not as prominent as the options to buy a book; it appears below the list of potential purchase locations. You do need to enter your ZIP code to see the list of locations, but that's normal as an individual user in Open WorldCat. It provided a nice box identifying my home library as UVA, and links to search for the book in my Resolver or catalog. It passed seamlessly through our Resolver which recognized the item as a book and passed the request along to our OPAC. The direct OPAC link worked fine as well. Google Scholar and Google have had this feature for a while -- it's nice to see it here.

I still wish I could figure out how to consistently determine the provenance of a digitized book in GBS, though. When the book has come from the publisher, the source is shown in the bottom left-hand corner. When a book has been digitized as part of the Google Library project, it says nothing.

Monday, September 18, 2006

ubiquitous access

This morning I was at a Library town meeting where James Hilton, the University's new CTO was speaking. I've heard him speak before at conferences, and I found an excellent article that covers much of what he discussed this morning that I think people should read:

What struck me this morning was the audience's reaction to much of what he was saying, because so much of it was new to them. Email is old-fashioned? What does DRM stand for? What is "Rip. Mix. Burn."?

The timing was strangely serendipitous. Last Friday, Jim Campbell was telling me that we should do something about developing a better-educated clientele -- not our patrons, but our internal Library clientele. Today we officially started planning a discussion series with our Library training coordinator to introduce our Library staff to information resources they might not know about (but our patrons probably do). It's not principally about new technologies, but about new resources. Exposing Library staff to new technologies tangentially would be a bonus.

Among the topics that have come up -- Library blogs and blogging, Wikipedia and wikis, IMDB, Flickr, Google Scholar and Book Search, (Open), RSS, mySpace, YouTube, LibraryThing, the Long Tail, and Web 2.0/Library 2.0 social systems. I'm sure there's a lot more we haven't gotten on the list yet.

Monday, September 11, 2006


Cory Doctorow has an interesting posting on the online version of Locus (a publication dedicated to the science fiction and fantasy publishing world) entitled "How Copyright Broke."

I'm particularly interested in his focus on end users rights, and the extreme lack of understanding of those rights:

No, the realpolitik of unauthorized use is that users are not required to secure permission for uses that the rights holder will never discover. If you put some magazine clippings in your mood book, the magazine publisher will never find out you did so. If you stick a Dilbert cartoon on your office-door, Scott Adams will never know about it.
When it comes to retail customers for information goods --— readers, listeners, watchers -- this whole license abstraction falls flat. No one wants to believe that the book he's brought home is only partly his, and subject to the terms of a license set out on the flyleaf.
But customers understand property -- you bought it, you own it --— and they don't understand copyright. Practically no one understands copyright.
There's no conceivable world in which people are going to tiptoe around the property they've bought and paid for, re-checking their licenses to make sure that they're abiding by the terms of an agreement they doubtless never read.
The answer is simple: treat your readers' property as property. What readers do with their own equipment, as private, noncommercial actors, is not a fit subject for copyright regulation or oversight. The Securities Exchange Commission doesn't impose rules on you when you loan a friend five bucks for lunch. Anti-gambling laws aren't triggered when you bet your kids an ice-cream cone that you'll bicycle home before them. Copyright shouldn't come between an end-user of a creative work and her property. Of course, this approach is made even simpler by the fact that practically every customer for copyrighted works already operates on this assumption.
I'm always interested in what he has to say, given his relationship with Creative Commons, and his release of his own works in digital form under CC licenses. This column doesn't suggest _how_ to change practices, but it's an interesting call-to-arms for authors.

I'm also always glad to read articles that express the distinctions between copyright, trademark, and licenses. I cannot count the number of outrageous beliefs that I have heard espoused by librarians regarding the aggregated set of topics that is usually just referred to as copyright. The most common issues that I run into are a lack of understanding of three topics:
  • The difference between copyright and other rights -- access rights and use rights; and
  • How rights (whether copyright or access or use rights) can be assigned via license that can override the rights assumed under the fair use doctrine; and
  • That a work that is in the public domain (say, a novel from 1890) can have manifestations (like a 1990 print edition) that are copyrighted.
The first is an issue of generating understanding of what restrictions a copyright holder can request of the users of a copyrighted work. This includes requesting limited access (such as limiting access to authorized users), or use restrictions (such as not allowing commercial use without permission).

The second is the assumption that fair use trumps, well, everything. The catch, as I understand it, is that contracts (and a license is a contract) can and do have restrictions in them that disallow uses that we would think of as OK under fair use. Contracts, under US law, are what can actually trump anything. If authors (or vendors with rights assigned by license to them by the rights holders that allow vendors to enter into licenses with users) want to design licenses that say a work can never be quoted in any context without express permission, they can. Luckily, libraries generally have smart acquisitions folks that review licenses with a fine-toothed comb to identify such unlikely terms and push back on them where possible.

The third is one that comes up a lot in the text digitization realm. How can, say, an edition of Tom Sawyer be copyrighted? Isn't it in the public domain? It was published in 1876! The catch is that publishers can copyright their editions/manifestations because of the work that goes into designing and producing them. So yes, an edition of Tom Sawyer can be copyrighted. As can the electronic transcription of a microfilm image of a newspaper article from 1885. The original print article is no longer covered by the publisher's rights, but the publisher of the microfilm and the creator of the electronic transcription can claim rights if their manifestation is deemed copyrightable, and therefore place restrictions. There is a lot of discussion about what level of production work makes a manifestation copyrightable, but that's a whole other topic.

Friday, September 08, 2006


Carrying on from the meme started at Confessions of a Mad Librarian and continued by Dorothea, here is the haiku Digital Access Services mission:

information with
as little impediment
as is possible

Wednesday, September 06, 2006

Google News Archive Search

I found this interesting article on Search Engine Watch:

Google's new News Archive Search lets you search back over twenty decades worth of historical content, including scads of articles not previously available via the search engine.

"The goal of this service is to allow people to search and explore how history unfolded," said Anurag Acharya, Google distinguished engineer, who played a major role in shepherding the new product.

Google has partnered with news organizations including Time, The Wall Street Journal, The New York Times, the Guardian and the Washington Post, and aggregators including Factiva, LexisNexis, Thomson Gale and HighBeam Research, to index the full-text of content going back 200 years.

Archived news results can be found in three ways. You can search the news archives directly through a new News Archive Search page. News archive results are also returned when you search on Google News or do a general Google web search and your query has relevant historical news results.

Both free and fee-based content is included in Archive Search, with content from both publishers and aggregators. Search results available for a fee are labeled "pay-per-view" or with a specific price indicated. Google does not host this content; clicking on a link for fee-based content takes you to the content owner or aggregator's web site where you must complete the transaction before gaining access to the content.

It's an interesting range of content -- many, many newspapers, Time Magazine, etc. -- but when I searched, the lion's share of what I found was for-fee or restricted by subscription, not free. Even materials dating to the 1850s-1890s were restricted by subscription or pay-per-view.

I tried a Washington Post article from 1894. Now, we subscribe to ProQuest Historical Newspapers, including the Washington Post. But the links in the Google search results took me through something called ProQuest Archiver, which redirected me to the Washington Post archive where I was asked to pony up $3.95. So I searched ProQuest Historical Newspapers and came up with the same article, free to me because it is covered by our subscription.

So, it's an interesting discovery tool, but unrelated to our licensed resources and expensive to use if you have to pay for almost everything you find. Why doesn't it have an OpenURL Resolver service like Google Scholar, so authorized users can get to authorized resources?

Tuesday, September 05, 2006

Thinking about user expectations for Library services

Our institution has been brainstorming about user access to content. I have been thinking about user expectations for Library online services in the context of other services that our community -- faculty and students -- use in their lives. I wrote this in July, framing it around my activities in a single weekend:

What did I do during an average weekend in summer 2006?

I checked my email.

I watched a movie that I rented via Netflix. It was a movie that I had never heard of before adding it to my rental queue -- Netflix's recommender service suggested that I might like it based on my rental history. And I did. If I were so inclined, I could add a personal review to the Netflix site.

The movie was based on a Korean novel, so I looked it up using Google and wikipedia.

I went to a movie at a theater with a friend. I looked up the time online and we confirmed using cell phones.

I scheduled payments for bills directly from my bank account online.

I ordered a media cabinet for my living room. I can track its shipping progress through the UPS web site.

I read part of a book that I purchased through It was highly recommended by a colleague, but I had also read reviews on a number of blogs, and through those blogs I found the blog written by the author -- the senior editor at Wired Magazine. I subscribed to the RSS feed so I wouldn't miss any postings.

I added that book to my personal LibraryThing catalog. 50 other users of the service also have the book, and I could read their reviews, see what tags they used, or start a discussion with them about the book's topic.

A couple of contacts on my flickr space wondered why I hadn't added any new pictures of my house, so I sorted through photos taken with my digital camera in preparation to upload them.

I had a conversation with my neighbor across the street about an anomaly that she found while searching for a new book in our opac. She's a retired English professor.

So, in a single weekend, I interacted with more than a half dozen digital services and took advantage of the output of several online social networks. This is in my personal life, and doesn't even take into account my professional activities. I may be more digital than some, but I'm part of the same generation as many of our more recently tenured faculty, and I know that our graduate students and undergraduates are even more plugged in than I am.

Given the number and types of digital services that our users encounter, what expectations might they have about the Library's services? How can the services be personalized for them? What notification services can we set up? Can we create more interactive/social networking opportunities for our users? How can we improve our core online service -- the ability to locate and use our collections? I'd like to see us give some serious thought to these issues, looking at other libraries and at the types of services that our users encounter in every other aspect of their lives.

Friday, September 01, 2006

Here I go ...

I am fascinated by digital libraries. How the content is selected, production and metadata standards, repository infrastructure, interfaces, and whether they are usable and useful. It's what I do every day.

I lead a digital life. Online banking, online shopping, digital cameras, LibraryThing, flickr, iPods, DVRs, etc. I never seem to get away from it.

I originally planned to call this blog "Digital Armadillo." I am fascinated by armadillos. The shape of their heads, their armor, their gait, and even the sight of dead ones on the side of the road, in the position that has come to be referred to as "casters up" in our household. I have acquired, though my own efforts and those of others, a selection of armadillo-themed objects. A coffee mug, magnets, postcards, a plush armadillo, a concrete garden ornament, and many pieces of Mexican folk art.

Somehow I will bring these topics and others together in a single stream-of-consciousness.