Thursday, May 29, 2008

digital art and museums

WireTap Magazine has a great interview with Richard Rinehart from the Berkeley Art Museum/Pacific Film Archive that considers a number of topics, including why museums should care about and collect digital art, and why museums should make digital art available for remix. Richard has done some amazing work in promoting the curation and preservation of digital art, and makes amazing art himself. I've known Richard about 15 years.

I also want to make a plug for Berkeley Big Bang 08 (one of Richard's projects) on June 1-3, and for 01SJ on June 4-8 in San Jose (Steve Dietz, another museum colleague who is well-known for his ground-breaking commissioning and curation of digital art at the Walker Art Center, is the Artistic Director for the festival). Anyone in the Bay Area with any interest in digital art and new media should attend one of both of these events next week.

Tuesday, May 27, 2008

more on the end of Windows Live Search Books

A roundup of commentary:

ars technica (includes comments from Brewster Kahle)

shimenawa/Peter Brantley


New York Times

Friday, May 23, 2008

first sale doctrine and software

There's nothing that I can add to this ars technica posting:

A federal district judge in Washington State handed down an important decision this week on shrink-wrap license agreements and the First Sale Doctrine. The case concerned an eBay merchant named Timothy Vernor who has repeatedly locked horns with Autodesk over the sale of used copies of its software. Autodesk argued that it only licenses copies of its software, rather than selling them, and that therefore any resale of the software constitutes copyright infringement.

But Judge Richard A. Jones rejected that argument, holding that Vernor is entitled to sell used copies of Autodesk's software regardless of any licensing agreement that might have bound the software's previous owners. Jones relied on the First Sale Doctrine, which ensures the right to re-sell used copies of copyrighted works. It is the principle that makes libraries and used book stores possible. The First Sale Doctrine was first articulated by the Supreme Court in 1908 and has since been codified into statute.

Read the entire post. The key to the ruling is that the judge decided that the Autodesk software was sold, not licensed, and therefore the First Sale Doctrine applied.

There's also an excellent post on William Patry's blog, with some very lively and extensive discussion that is itself worth reading.

Windows Live Search Books going away

Peter Brantley forwarded a Microsoft message from the Live Search blog to the DLF community. Excerpt:

"Today we informed our partners that we are ending the Live Search Books and Live Search Academic projects and that both sites will be taken down next week. Books and scholarly publications will continue to be integrated into our Search results, but not through separate indexes.

"This also means that we are winding down our digitization initiatives, including our library scanning and our in-copyright book programs. We recognize that this decision comes as disappointing news to our partners, the publishing and academic communities, and Live Search users.

"Based on our experience, we foresee that the best way for a search engine to make book content available will be by crawling content repositories created by book publishers and libraries."
I never really used Live Search Books, but I have colleagues who said very good things about it, some of whom thought it was better than Google Books in terms of search success and consistency and delivery UI. I hope that the output of the digitization by the partners can be re-purposed into other services. Their message encourages partners to continue working with the Internet Archive, so I feel hopeful that the equipment and processes put into place for this project will also continue to produce output even without Microsoft's involvement.

Thursday, May 22, 2008

more on oclc and google

Here's an article in Information Today on the OCLC/Google agreement.

This focuses more on the addition of Google Book links into existing WorldCat records, and the creation of new records for volumes not currently in WorldCat. I wonder if this means adding 856 field links to digital surrogates, or creating separate digital resource records? All OCLC member institutions will be able to add records into their catalogs.

This article doesn't mention one aspect of the agreement -- that the agreement now allows Google Book partners to share records from their catalogs that have an OCLC provenance with Google (or let OCLC do it for them). There has always been a lot of discussion about what rights member institutions had vis-a-vis sharing OCLC-sourced records that represent their holdings, and not everyone agrees with OCLC's assertions of its rights. Given Google's need to know something about the volumes that it's digitizing to provide access, it seems unavoidable that sharing of some OCLC-sourced metadata between Google Book participants and Google has already happened. Now it's a recognized need and activity.

Wednesday, May 21, 2008

oclc and google cooperation

On Monday a press release was issued about cooperation between OCLC and Google. Excerpted:

OCLC and Google Inc. have signed an agreement to exchange data that will facilitate the discovery of library collections through Google search services.

Under terms of the agreement, OCLC member libraries participating in the Google Book Search™ program, which makes the full text of more than one million books searchable, may share their WorldCat-derived MARC records with Google to better facilitate discovery of library collections through Google.

Google will link from Google Book Search to, which will drive traffic to library OPACs and other library services. Google will share data and links to digitized books with OCLC, which will make it possible for OCLC to represent the digitized collections of OCLC member libraries in WorldCat.


WorldCat metadata will be made available to Google directly from OCLC or through member libraries participating in the Google Book Search program.

Google recently released an API that provides links to books in Google Book Search using ISBNs, LCCNs and OCLC numbers. This API allows users to link to some books that Google has scanned through a “Get It” link. The link works both ways. If a user finds a book in Google Book Search, a link can often be tracked back to local libraries through

The new agreement enables OCLC to create MARC records describing the Google digitized books from OCLC member libraries and to link to them. These linking arrangements should help drive more traffic to libraries, both online and in person.

There are a couple of big wins here for different communities.

For WorldCat users, there is direct access to Google Book Search volumes. For Google Book users, there is improved access to physical volumes.

For Google Book participant libraries, there are better mechanisms for getting metadata about collections to Google.

Even more importantly -- and I am being hopeful here and reading something into this that may not be there -- this is a potential way to get representation of volumes digitized as part of the Google project not only into WordCat but into the OCLC/DLF Registry of Digital Masters. For folks unfamiliar with that project, it's a registry of digitized volumes -- which meet certain digitization standards and are publicly accessible -- that can be used as a tool by libraries and users to determine if volumes have already been digitized and are available. It's a slowly growing registry where a devoted group of participants have been working to develop the standards for describing digital masters and the work flows for adding records. This is a service that is poised to become essential.

copyright and course materials

I've been reading up on a case out of the University of Florida where Michael P. Moulton, an associate professor of wildlife ecology and conservation, is part of a legal battle against Einstein’s Notes, a company that sells students study kits and lecture notes.

While his publisher, Faulkner Press, brought the suit, one of the more interesting issues to me what Moulton's claim of copyright on his lectures and that any notes taken by students is an infringement, but an allowable fair use infringement as long as the notes aren't sold. Moulton and Faulkner Press have a strong case for infringement when it comes to material from his published textbook that he uses in teaching. Mouton's claims that his printed lecture study guides are copyrighted absolutely has merit. But I have often heard that faculty cannot claim copyright on their teaching material -- lectures, syllabi -- because those are work-for-hire products that they create as part of their employment at a university, and that the university either holds copyright or at least holds an interest in the copyright. There's also the issue that, if not in a fixed form, like a written lecture, a lecture can be protected but not necessarily copyrighted. The University of Florida cleared his copyright registration, so this has obviously been vetted by their general counsel's office.

I once worked on a project where a dean forbade faculty from making syllabi on course sites publicly accessible because they were considered a work-for-hire. It seems that university policies have very much shifted in the years since I last worked with online course issues.

Site on the suit

Chronicle of Higher Ed Interview

Wired article

ars technica post

Tuesday, May 20, 2008

Larry Lessig on orphan works in the NY Times

Larry Lessig has written an op-ed piece that appears in the New York Times on Congress's consideration of a major reform of copyright law intended to solve the problem of orphan works. he characterizes the current attempt at reform as "both unfair and unwise."

But precisely what must be done by either the “infringer” or the copyright owner seeking to avoid infringement is not specified upfront. The bill instead would have us rely on a class of copyright experts who would advise or be employed by libraries. These experts would encourage copyright infringement by assuring that the costs of infringement are not too great. The bill makes no distinction between old and new works, or between foreign and domestic works. All work, whether old or new, whether created in America or Ukraine, is governed by the same slippery standard.

The proposed change is unfair because since 1978, the law has told creators that there was nothing they needed to do to protect their copyright. Many have relied on that promise. Likewise, the change is unfair to foreign copyright holders, who have little notice of arcane changes in Copyright Office procedures, and who will now find their copyrights vulnerable to willful infringement by Americans.

The change is also unwise, because for all this unfairness, it simply wouldn’t do much good. The uncertain standard of the bill doesn’t offer any efficient opportunity for libraries or archives to make older works available, because the cost of a “diligent effort” is not going to be cheap. The only beneficiaries would be the new class of “diligent effort” searchers who would be a drain on library budgets.

It seems that this reform introduces so many layers of complexity into the determination of copyright status that it would inevitably quash potential use of works because no one could figure out the process or afford the time or expert opinion needs for the determination.

Wednesday, May 14, 2008

who owns rss feed content?

There's a good article in pc world out of Australia by Larry Borsato on the ownership issues around RSS feeds. If a site aggregates RSS feeds from other sites as content without permission from the original content owner, what are the legal or moral issues?

it's all about having options

Andrew Pace wrote an interesting post about his take on Library 2.0: It's the data, stupid. My highly simplified version of his thesis is that Library 2.0 is about control and presentation of data, and how we might give the best access to it.

I think that there are a couple of corollaries to this that libraries have only recently begun to consider and implement. First, there is NO ONE WAY to best provide access, and that providing multiple paths and formats is necessary because we can never imagine what all the potential uses of our data are. Data should be exposed in as many ways as an institution finds sustainable, using appropriate community standards.

The second corollary is that while varied and easy access to data is vital, Library 2.0 is also about the personalization of discovery and use of data. Whether it's applying personal tags or personal filters to improve or focus discovery and retrieval, or applications that can take advantage of Identities and/or other APIs for personal or community-based mashups, it's all about how I might need to discover and use the data versus how Andy might need to work with data, and that those needs will likely be different next month than they are now. Last month I was researching Institutional Repository software solutions. This week I'm reading up on the Django web application framework. Last year I may have wanted to combine geographical and geocoding data with images. I don't know what I'm going to want to do next year.

Library 2.0 is about providing options to users, and removing barriers to innovative use and re-use.

Friday, May 09, 2008

happy birthday copyright

We've all heard that every time someone sings "Happy Birthday" that the copyright holders should be getting paid. Via William Patry's blog, a link to a remarkable article and web site from Professor Robert Brauneis of George Washington Law School that present all the evidence he could find about the copyright status of the song.

application profile for images

In the most recent issue of Ariadne there's an interesting article entitled "Towards an Application Profile for Images" by Mick Eadie. JISC has just completed the first phase of some work drafting an application profile for images aimed primarily at the repository community.

I was happy to see recognition of the complexity of digital image objects and the need to track relationships between images, the sources of those images, the content depicted in the images, etc. They looked at FRBR and at the VRA Core, and ended up creating a conceptual model with the digital file at the center that uses the language of FRBR.

I find myself disagreeing with some of their decisions:

In our model, we have renamed the FRBR Work entity as ‘Image’ for reasons of clarity, mainly to avoid confusion between notions of Work as described traditionally in image cataloguing in the cultural sector (i.e. the physical thing) and abstract Work as described in FRBR. As noted above, image as defined in the IAP is a digital image, in line with our notion of end-users searching repositories for digital images of something. Therefore our conceptual model - while still using the language of FRBR and using the areas of SWAP that have applicability across the text and image domains - places the digital image at its centre.
Architecturally, the JISC model makes sense. You have an image, which is a depiction of a site or a work of art, with manifestations as one or more formats of files. That's what you have to physically manage in a repository.

From a discovery point of view, thought, I'm not fully convinced. Having modeled image objects in a cataloging environment and in a repository architecture and discovery environment, I think that FRBR and the VRA Core have it exactly right as to what users are looking for -- images of something. Researchers are looking for images of Chartres Cathedral or Jeff Koons' "Rabbit" or Leonardo da Vinci's "Vetruvian Man." They are looking for images of those Works. When we designed the content models for images in the UVA Digital Collections Repository, the top level is that sense of work. The top level is a work object, which has child expressions for individual views of that work, which have child manifestations that are the actual media files. We found this relatively easy to manage and to build a discovery interface around. It was also the model already used in the cataloging, so it made for an easier translation from that system to the repository.

I need to review the Images Application Profile (IAP) work in more detail. I know this is aimed more at use for IRs and not for image collections, but I think such a profile can only become ubiquitous if it covers both repository scenarios. Their proposed metadata is well-positioned with its use of MIX elements to manage the image files, but I see less than I'd like to see in descriptive elements that support discovery. For example, I don't see a descriptive content date element. I cannot think of a research use case that wouldn't include searching for, say, images of 18th-century French sculpture. I think they should give more thought to incorporating more from VRA Core and CDWA.

Thursday, May 08, 2008


I recently become aware of an interesting content management system called Kete, which was developed for the Kete Horowhenua site in New Zealand. It's a repository and discovery service that supports uploading and metadata creation through a web interface. It supports the inclusion of:

  • Images
  • Audio recordings
  • Video recordings
  • Documents
  • URLs for web resources
Metadata can be "locked" so only the creator can edit it, or be open for any to edit. They are collecting some amazing biographical details for their Anzac (veterans) collection through the community. Every "topic" (a subject, a place, a person) can have its own discussion.

Kete Horowhenua was developed with Ruby on Rails, utilizes Zebra z39.50 full text indexing engine developed by IndexData, is fully compatible with Koha, and will be released under a GNU General Public License (GPL). The Kete software is available for download. They are in the process of building a release of the code without the Horowhenua project customizations that can be deployed using a web based wizard that supports customization. They are looking for funding to support this work, as they admit that they underestimated the development needs. They are even accepting PayPal donations to help the work along!

It's an interesting looking site and tool, but I don't know how much is specific to the Horowhenua version and what will be in the generalized version. The browse UI could use a little refinement (I couldn;t figure out how to sort, or if you can sort), but there's a lot of promise here.

Monday, May 05, 2008

beCamp 2008

As tired as I was by the time I got home Saturday night, I am very happy that I made it to beCamp this year. Close to 100 people attended over the 2 days, about half of whom hadn't attended last year. It was great to see so many new faces!

There was a spirited discussion of "Web 3.0" where there was a lot of discussion about advertising. Never having been in the commercial sector, this is something that pretty much never occurs to me. Some folks looked at me as if I were crazy when I suggested that Web 3.0 was about personalization and localization services for users, not about advertising. There was a brief exchange about privacy issues, especially around the topic of providing personal information in exchange for personalized and localized services. I'm not sure that anyone would trade a lot of their personal info in exchange for a free coffee coupon, but others in the room were certain of this.

Josh Malone from the National Radio Astronomy Observatory led a great discussion on large-scale storage needs. There are certain similarities between their needs and the Library of Congress's needs vis-a-vis transfer of files and creation of deliverables for users, but their data is observational and is never updated, while our metadata and media files may be updated. They are considering some systems that I know another institution is using and I need to get folks in touch with each other. I also need to learn a lot more about our storage infrastructure at LC.

Baron Schwartz led a great session on mySQL optimization. He's just joined Percona as a mySQL consultant. Buy his upcoming book! I also realized after the fact that the reason I recognized his name was that, as an undergrad, he worked with my UVA colleague Perry Roland on the Museum Encoding Initiative.

There was a roundtable on data visualization where I heard about a number of tools that I wasn't familiar with, especially two PHP graphing libraries -- jpGraph and sparkline. For anyone on twine, I have a Search and Information Visualization twine started.

Erik Hatcher led a Lucene optimization discussion. Bess asked lots of questions in preparation for getting Blacklight into production!

Steve Stedman gave an informal demo of the Expression Engine content management system. Very interesting, especially its capabilities in resizing graphics on-the-fly.

Those are the sessions I made it to -- check out the wiki for the other discussions on the schedule, and links to slideshare for available presentations. Ruby and Ramaze, High Availability Linux, Adobe Air, Google App engine ...

Friday, May 02, 2008

OhioLINK EAD repository and tools

OhioLINK has launched a Finding Aid Creation Tool and Repository for use by any institution in Ohio, which does not require membership in OhioLINK to use.

The OhioLINK Finding Aid Repository takes advantage of XTF in a clean and simple way. It's not clear what repository is behind it. The EAD FACTORy tool is available for use by Ohio institutions for EAD authoring. It doesn't say when or if the code for the tool will be released.

following up on Bridgeman

Peter Hirtle has an excellent post on the LibraryLaw Blog about a panel presentation at the New York Bar Association called "Who Owns This Image? Art, Access, and the Public Domain after Bridgeman v. Corel." I also highly recommend Rebecca Tushnet's post.

I have encountered many interpretations of Bridgeman over the years. I've heard it used to defend or attack copyright, performance, and use rights policies relating to images. I have been thinking a lot lately about the appropriateness of claiming copyright of images that are captured of book pages while we often take advantage of the lack of copyright-ability of images of 2-D works of art.

I was surprised to note in a comment on William Patry's blog post on the event that the Art Institute of Chicago has an exhibit on copyright in art publishing: "Copyright Law: Publishing Art and the Public Domain." I wish there were more details online about the content. Has anyone seen the exhibit?

Thursday, May 01, 2008

beCamp this weekend

Even though I moved away from Charlottesville last week, I'll be back this weekend -- for beCamp! We'll likely miss part or all of Friday night, but nothing could keep us away on Saturday.


  • Friday, May 2nd, 5:00PM-9:00PM
  • Saturday, May 3rd, 9:00AM-5:00PM


  • Charlottesville Business Innovation Council (on the Downtown Mall)
  • 501 East Main Street, Charlottesville, Virginia 22902