Wednesday, September 24, 2008

new version of getty introduction to metadata

The third edition of the Getty Introduction to Metadata -- edited by Murtha Baca, with essays by Tony Gill, Anne J. Gilliland, Maureen Whalen, and Mary Woodley -- is now available online and in hard copy. This is a very useful overview and it's nice to see it updated.

Thursday, September 18, 2008

generational myths

Siva Vaidhyanathan has a great article in The Chronicle Review entitled "Generational Myth." Siva and I first met through our online discussion of this topic -- I very strongly agree with him on this issue.

Lorcan Dempsey posted about a couple of blog posts by Andy Powell and Dave White about their takes on this issue. Dave's proposed "Resident" and "Visitor" categories and his acknowledgment of the spectra of behaviors that these categories represent is a well-considered take on how libraries might better understand styles of learning of distance students in particular. I'm obviously not a fan of human categorization -- people are notoriously hard to pigeonhole. But I think these are actually more akin to personas than categories, like those that you'd develop as an exercise when designing a new online service. Not unerringly accurate, but not without usefulness. It's certainly supplements the often simplistic thinking about our users as "faculty" or "graduate students" or "undergraduates" or "the public."

I also strongly recommend Janna Brancolini's blog Generation Underrated, her response to Mark Bauerlein's The Dumbest Generation: How the Digital Age Stupefies Young Americans and Jeopardizes Our Future (Or, Don't Trust Anyone Under 30). Check out what someone under 30 has to say, who also happens to be the daughter of a digital librarian.

grapes need a eula?

From Serious Eats, an image of an empty bag of grapes ... with a EULA.

The recipient of the produce contained in this package agrees not to propagate or reproduce any portion of the produce, including (but not limited to) seeds, stems, tissue and fruit.
To me this is particularly amusing because they're seedless grapes ...

Wednesday, September 17, 2008


There is major buzz around the announcement that the Los Alamos National Laboratory Research Library has released djakota, a "reuse friendly" open source JPEG2000 Image Server. It's available on SourceForge under a GNU Lesser Public General License.

There's an excellent D-Lib article that fully describes the server. I love the first sentence of the article: "The Digital Library Research & Prototyping Team at the Los Alamos National Laboratory (LANL) enjoys tackling challenging problems." Now there's an understatement!

We did some explorations with Kakadu (one of the components of djakota) when I was at UVA, and we use Aware at LC. I plan to take a long, hard look at this.

smithsonian digitization initiative

There's an announcement on CNN that the Smitshonian plans to put its 137 million object collection online. The new Smithsonian Secretary G. Wayne Clough said in an interview that they do not yet know how long it will take or how much it will cost to digitize the full 137 million-object collection and will do it as money becomes available. A team will prioritize which artifacts are digitized first. They plan to focus on making the collections usable for the K-12 audience.

When I was at the Smithsonian yesterday for David Weinberger's talk, this seemed to be a buzzing topic of discussion among audience members; one Smithsonian employee even mentioned it in a question to Weinberger, expressing a certain level of surprise.

Tuesday, September 16, 2008

small pieces loosely joined by metadata

Today I had the extreme pleasure of attending a talk by David Weinberger (The Cluetrain Manifesto, Small Pieces Loosely Joined, Everything is Miscellaneous) at the Smithsonian, entitled "Knowledge, Noise and the End of Information." It was webcast, and I strongly suggest viewing it if you can.

There was lots of interesting discussion about the definition of information, the innately social nature of the human race and how social interaction is a vital aspect of information discovery, and how the loosely joined and messy nature of the internet just reflects human nature and is not a bad thing. He also stressed that one can never know what digital information might be of importance in the future, so we should, as cultural institutions, be striving to keep as much as possible. He also touched on the importance of brand and authoritativeness, but not to equate that with control.

A word I did not expect to hear today, let alone about a hundred times, was "metadata." The cellphone image above is a shot of one of his concluding statements. He talked a lot about the importance of metadata, whether it be authoritative cataloging, community tagging, or contextual relationships through linking. Since we cannot ever imagine all the uses for our digital content we cannot possibly expend the costly effort to provide all the descriptive metadata that every community might want or need, so all three are complementary and of equal value.

One of my take-aways was that this again shows the importance of just getting digital content out there. Let the content express itself through its authoritative metadata, but also provide open access and support multiple mechanisms through which it can be incorporated into new contexts and uses and gain new descriptions.

Monday, September 15, 2008

open access to museum collections

Last Friday there was a post on Open Access News that Wake Forest University's Anthropology Museum had issued a press release about the launch of its online collections, supported by an IMLS grant.

I welcomed this news on many fronts -- there aren't enough ethnographic or archaeological collections online; the museum is using Re:discovery, a great product geared toward small museums; and I have a number of friends with ties to Wake Forest and I've visited Winston-Salem many times and have a fondness for the area.

What made me sit down to think about this for a few days was the passing description of this an an Open Access project.

I worked for many years in the museum community, and every museum that I ever worked for or consulted for wanted to make its collections available in one digital form or another. The Museum Computer Network was founded in 1967 to enable museums to automate their processes and convert collections records to digital form. Museums were among the earliest institutions to share their collections online in the mid 1990s. The University of California Museum of Paleontology had a web site in 1994. The Fine Arts Museums of San Francisco brought their Thinker "imagebase" online in 1996 -- and they had volunteers assist with an early form of experimental user supplied subject metadata. e.g., proto-tagging. By 1997 the National Gallery of Art provided access to over 100,000 objects in its collection, and the Los Angeles County Museum of Art experimented with converting print museum catalogs into freely available online publications.

Sure, there have been lengthy discourses about levels of access to the digital media surrogates and questions of rights and control of those new media assets, and there is some information about the acquisition of objects that's subject to privacy restrictions, but no museum wants to limit discovery of their collections -- they want to facilitate their collections' use in research and teaching.

I've just not heard it described as "open access" before.

I'm not saying that it isn't a sort of open access initiative -- it most obviously is -- but I just think of it as such a normal museum activity I don't categorize it in my mind as anything other than business as usual. Then it hit me -- for the past 15 years museums have been major players in the open access movement without necessarily always knowing it.

Labeling this an open access initiative re-contextualizes this core museum activity into a different realm -- one that I hope will make museum collections information more visible and reinforce the importance of all categories of open access content.

Friday, September 12, 2008

nsdl metadata registry

This afternoon a group of us had the opportunity to sit down with Jon Phipps, implementer of the NSDL Metadata Registry.

I knew that such a thing existed. I understand RDF. I know about SKOS. I hadn't really given a lot of thought as to how to best take advantage of it.

Today, I had one of those skies are opening and angels are singing from on high moments. RDF can be used to model relationships between concepts and potentially enforce them through schemas. This can obviously be applied to improve discoverability when a hierarchical taxonomy is employed. Then my LC colleague Clay Redding said that he was experimenting with multiple schemas and managing additional local alternative labels in addition to authoritative preferred labels. And then Jon and Ed Summers mentioned the potential for this tool to map across schemas. My a-ha moment was understanding the potential for formalized mappings across metadata schemas to improve discoverability within and across collections described with hetreogenous taxonomies and vocabularies.

I remember using Chenhall's Nomenclature in records for ethnographic objects where we recorded every level of the hierarchy in its own field -- It was madness. I remember when we were in the early days of the AAT, busily submitting new terms and building the hierarchies, our dream was searching for "case furniture" and getting results with bookcases, chests, desks, wardrobes, and every semantic child where "case furniture" never appeared in the record.

I remember some research at USC in the late 1990s about thesaurus-enabled searching. OCLC's Metadata Switch project has done some work in cross-schema mapping. I know this is very difficult to accomplish. Today was the first time I saw a tool that might make the conceptual mapping simpler. But not simple. This is a potentially massively overwhelming task if it can't be done programmatically to a large extent.

I'm coming late to the party, but now I'm really intrigued by what might be accomplished in this arena.

Tuesday, September 09, 2008

Cory Doctorow book of essays

I am a big fan of Cory Doctorow's writing -- his fiction and his essays on technology, rights, and privacy. Via BoingBoing, comes word of his new book of essays -- Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future.-- which he is making available as a free Creative Commons licensed PDF download.

I you haven't read Cory Doctorow yet, you should. I don't always agree with everything he says, but he is thoughtful and technologically savvy and writes thorough essays on very relevant topics in an entertaining style.

I've read some of these essays before, but having them together in one beautifully-designed volume that I can always refer to is the proverbial good thing.

LoC Repository Development Group hiring

Our group has a position open. Visit the LoC jobs page and search for posting "080214". The posting does not mention our unit specifically, so this is a head's up that the job is with us. We're still a relatively new group, working on a variety of projects with many units across the Library and developing our group's role in the institution.

The application period closes on October 3, and that is an absolute deadline. You must apply using an online federal job application system -- it's a lengthy form that requires some time to fill out. Be prepared with electronic copies of your documents to cut-and paste.

EDIT (9/24/2008): This position reports to the Director of the Repository Development Group. Everyone in the team -- including me -- reports to the Director. There is no additional management structure.

Monday, September 08, 2008

google newspaper digitization

Google is digitizing newspapers.

Not only will you be able to search these newspapers, you'll also be able to browse through them exactly as they were printed -- photographs, headlines, articles, advertisements and all.

This effort expands on the contributions of others who've already begun digitizing historical newspapers. In 2006, we started working with publications like the New York Times and the Washington Post to index existing digital archives and make them searchable via the Google News Archive. Now, this effort will enable us to help you find an even greater range of material from newspapers large and small, in conjunction with partners such as ProQuest and Heritage, who've joined in this initiative. One of our partners, the Quebec Chronicle-Telegraph, is actually the oldest newspaper in North America—history buffs, take note: it has been publishing continuously for more than 244 years.

You’ll be able to explore this historical treasure trove by searching the Google News Archive or by using the timeline feature after searching Google News. Not every search will trigger this new content, but you can start by trying queries like [Nixon space shuttle] or [Titanic located]. Stories we've scanned under this initiative will appear alongside already-digitized material from publications like the New York Times as well as from archive aggregators, and are marked "Google News Archive." Over time, as we scan more articles and our index grows, we'll also start blending these archives into our main search results so that when you search, you'll be searching the full text of these newspapers as well.
It's interesting that they're working directly with publishers and with aggregators such as ProQuest to digitize and improve discoverability of back files. That's good news, but do they also plan to work with major newspaper open access projects such as the National Digital Newspaper Program? Are they digitizing any collections in addition to publisher collections?

When I last looked at the Google news archive in September 2006 I found that way too much of the content was pay-per-view, made you pay even if your institution had licensed subscription access, and didn't work with OpenURL resolvers. I don't see that any of that has changed. I hope it will.

vintage museum photos

Via BoingBoing, check out these fabulous vintage photos from the American Museum of Natural History. I love dioramas, and the exhibit installation images are just great. Taxidermy mounting, diorama background painting, articulating dinosaur bones, casting animal models ... And the vintage exhibitions! I love the images of earnest children being led around ... and doing so-called Indian dances in their construction paper bonnets. State-of-the-art, 1900s-1970s.

Saturday, September 06, 2008

ambient awareness

This week's New York Times Magazine has a piece by Clive Thompson that explores issues around ambient awareness and privacy. Facebook, twitter, flickr, dopplr, and texting and blogging more generally. Is it narcissistic to broadcast your status using awareness tools? Are these tools to improve connectedness in a more mobile and global human ecology -- the ultimate tools for building and maintaining relationships?

This is the paradox of ambient awareness. Each little update — each individual bit of social information — is insignificant on its own, even supremely mundane. But taken together, over time, the little snippets coalesce into a surprisingly sophisticated portrait of your friends’ and family members’ lives, like thousands of dots making a pointillist painting. This was never before possible, because in the real world, no friend would bother to call you up and detail the sandwiches she was eating. The ambient information becomes like “a type of E.S.P.,” as Haley described it to me, an invisible dimension floating over everyday life.
And when they do socialize face to face, it feels oddly as if they’ve never actually been apart. They don’t need to ask, “So, what have you been up to?” because they already know. Instead, they’ll begin discussing something that one of the friends Twittered that afternoon, as if picking up a conversation in the middle.
An interesting section focuses on the so-called "Dunbar Number" -- just how many people can you be "friends" with, anyway? According to anthropologist Robin Dunbar, about 150. Can you max out on social connectedness? Not really, since many of one's ambient connections are weak ties, not close, intimate friends. But weak ties are just an important part of social and professional networks.

I find it useful to check in on my Facebook account and see the status newsfeeds of my friends and colleagues. I have also personally met all but a handful, and I believe that they are controlling their feeds and filtering what they write in their status that maintains their chosen levels of privacy. I keep my status updated. I blog, and I know and expect that people who have never met me read it. But is the ability to follow personal newsfeeds and tweets of people you will never know a creepy invasion of privacy, making it too easy to develop parasocial relationships? Or is it all just part of ubiquitous ambient awareness where participation is increasingly not optional?

I originally refused to blog or join Facebook because I thought it was vain to assume that anyone wanted to know what I was thinking or doing, and that I'd be giving up my privacy. OK, I have given up some of my privacy, but I've also made new connections I might never have otherwise, re-established relationships that had gone dormant, and built stronger ties with geographically disparate friends. While I'm not willing to give up my privacy for a free cup of coffee, I am willing to give up some privacy to to that.

Wednesday, September 03, 2008


The University of Michigan has announced that their MBooks initiative has grown into a shared repository effort called the HathiTrust (pronounced hah-TEE).

HathiTrust was originally a collaboration of the thirteen universities of the Committee on Institutional Cooperation (CIC) to establish a repository for those universities to archive and share their digitized collections. All content to date has been supplied by the University of Michigan and the University of Wisconsin, and Indiana University and Purdue University will soon be contributing their digital materials. 20% of its current content is open access and 80% is restricted. Don't look for a single search interface yet -- it's planned. As they say: "Good, useful, technology takes time.... and the strength and insight born of collaborative work."

The new HathiTrust initiative has been funded for an initial five-year period beginning January 2008, and is now open to other institutions. Partners will be charged a one-time start-up fee based on the number of volumes added to the repository, in addition to an annual fee for the curation of the data. They already support both open access and dark archive materials, and will also do so for new partners.

Their July 2008 monthly report gives a good sense of their activities. It is interesting to note that the only initial ingest workflow supported is the Google partner workflow. That's not too surprising since this work is based on the MBooks project developed in support of Google content workflows. That in and of itself ensures that there many institutions who'll be considering partnership.

This announcement is exceptionally exciting. I look forward to its development as a service.

Monday, September 01, 2008

kete 1.1

I've blogged about Kete before - version 1.1 has been released.

From the announcement:

Kete 1.1 is now available with a giant helping of new features and improvements. This is also the first release where you can grab Kete from our code repository's new home at See for details or browse the code online at

For those who haven't seen Kete in action, Kete is open source software that enables communities, whether the community is a town or a company, to collaboratively build their own digital libraries, archives and repositories. Kete combines features from Knowledge and Content Management Systems as well as collaboration tools such as wikis, blogs, tags, and online forums to make it easy to add and relate content on a Kete site. You could create a service like Google's Knol for your community using Kete.

An in-depth list of features and issues resolved can be found at , but here are some highlights: