Digital Eccentric: July 2007

Tuesday, July 31, 2007

DNA-driven music

I love that someone wants to use DNA as the source to generate music. It reminds me of Natalie Jerimijenko's kinetic sculpture at Xerox Parc that reacts to packets on its network. I'm saddened that someone felt the need to patent it, and it sounds like there's a lot of prior art to contradict the patent issuance.

random thoughts after reading the Ithaka report

I've read some very astute commentaries on the new Ithaka report "University Publishing in a Digital Age" from if:book, Dorothea, and Karen.

Dorothea and Karen both commented at length about the placement of IRs in the new publishing scheme. Dorothea hits the nail on the head when she points out that while IRs can be compared to dusty attics, they're still a form of distribution and preservation. An informal form of distribution perhaps, but formal archiving, where the output of scholarship in a digital form is collected and maintained. The report acknowledges this. The perception that we're receiving large sums of money to do this is surprising. We're getting money to write software; we're not necessarily getting money to sustain the services that the software enables.

Is an IR publishing? Preservation and distribution, yes. Publishing? I'm not so sure. Part of the scholarly communication life cycle, a place to preserve publishing output? Absolutely. As Dorothea points out, this is a relationship we could have with University presses. It's one we're already developing directly with the faculty for their born-digital scholarship and their published works.

To get on my soap box, not all repositories are IRs. UVA operates a Fedora-based repository to manage and make usable its locally digitized collections, collections that our subject specialists and faculty have selected, which includes our digitized collections and born-digital scholarship that we collect directly from our faculty. In this way we make digital content available for new teaching and research as well as collecting the work created from our collections and others. Fedora repositories will also have a role in the infrastructure of our digitization-on-demand service, our nascent "Academic Information Space" environment for the creation of digital scholarship, and for an IR. Repositories can have a role in all stages of the scholarly communication life cycle.

Then there's this in the report:

Librarians with whom we spoke view their role with respect to scholarly communications as making sure they have robust online collections; creating research environments (e.g. collections and tools) that will help faculty and graduate students create the scholarship of the future; finding ways for the institution to take back more control and lower the cost of scholarship; and developing infrastructure and tools to enable multimedia. Increasingly, these roles bleed into what might be considered “publishing”. The role of librarians has always been, in part, to provide services to the local community that help them find information, or learn how to find information. With the advent of online resources, librarians developed skills in accessing and managing online data. It therefore is not surprising that many faculty members and students have turned to librarians for assistance in producing electronic resources. One librarian stated that “Faculty are coming to us to help them with their electronic publishing needs. We have the technical expertise on staff to help them push the envelope of new forms of scholarship.” Another stated that “The library’s task is to create the online research environments of the future – collections, accessibility, tools.” Some librarians see themselves as pioneers and innovators in bringing scholarship online.

This statement resonated with me because that's what we've been doing at UVa for fifteen years. Now we're looking at more sustainable ways to enable this form of publishing, and for ways to preserve what we've created. It's time for a new service model.

Friday, July 27, 2007

book written via cell phone

I don't know if the claim that this is the first book written using a cell phone is true, it's likely the first book written using a phone to be published.

Italian author Robert Bernocco has amazed the literary world by publishing the world’s first book written using a mobile phone. Bernocco published it on Lulu.com, the online marketplace for digital content.

Cristel Lee Leed, European Vice President at Lulu.com, says, “We live in an age when individuals are strapped for time due to work and family commitments, and this can often stifle creativity. Robert Bernocco is a great example of the type of author we often encounter on Lulu — he has not only been creative with what he has written but also with how he has written it!”

Bernocco took advantage of his idle time while commuting to and from work by train, writing his 384-page science fiction novel, Compagni di Viaggo (Fellow Travelers is the English translation), on his Nokia 6630 phone, using the phone’s T9 typing system.

By dividing his manuscript into short paragraphs, Mr. Bernocco wrote his novel in perfect Italian, not your typical text-message shorthand, and saved the paragraphs on his mobile phone. Mr. Bernocco then downloaded them onto his home computer for proofreading and editing. The book took him 17 weeks to write.

It's also interesting that this press release includes none of the stigma that used to be attached to self-publishing -- that this was published through Lulu and not through a "traditional" publisher. That seems even more telling than using a cell phone as a writing tool.

Monday, July 16, 2007

Simon & Schuster digitizing backlist

I received an email newsletter from a digital content services vendor today that included an interesting promotional note -- Simon & Schuster is teaming with Innodata Isogen to digitize and convert thousands of backlist titles spanning more than 80 years of publishing history.

The interview included in the article describes the project in a couple of ways -- they see this project as a form of preservation reformatting given potential deterioration of the print, and to support future electronic distribution ventures.

I was particularly taken with their brief description of their selection criteria:

"... we selected both recently published and perennial bestsellers as well as deep backlist with low, but consistent, demand. We prioritized titles based on several factors including backlist sales numbers, “long tail” consumer interest (based on search engine traffic and backorder volume from retailers and distributors) and unexercised eBook publishing rights. We also took into account certain risks to the rich and varied backlist we offer that warranted pushing titles to the front of the line – including waning inventory availability and other factors."

It's not an open access project (they started publishing in 1924), but not every digitization project is. They're frank in saying that their goal is to open up new sales avenues. I'm not sure about their statement that they will be "protecting the intellectual property of our authors and illustrators." I can only assume they mean that by making legal digital versions that they will be curtailing illegal digitization and distribution.

It certainly was effective marketing for S&S -- It got my attention.

Open Library

I haven't had much of a chance to explore this yet, but the Internet Archive has announced a demo of its Open Library project:

http://demo.openlibrary.org/

It aims to be an open catalog of works in print. It's a wiki of sorts, where users can edit the metadata in the catalog. If a work is available online, there can be a link to it. Cover images can be added, and summaries, reviews, and in some cases the first chapter.

From their site:

It would take catalog entries from every library and publisher and random Internet user who is willing to donate them. It would link to places where each book could be bought, borrowed, or downloaded. It would collect reviews and references and discussions and every other piece of data about the book it could get its hands on.

But most importantly, such a library must be fully open. Not simply "free to the people," as the grand banner across the Carnegie Library of Pittsburgh proclaims, but a product of the people: letting them create and curate its catalog, contribute to its content, participate in its governance, and have full, free access to its data. In an era where library data and Internet databases are being run by money-seeking companies behind closed doors, it's more important than ever to be open.

The metadata follows their own schema, transformed from MARC and other formats. They're gathering records from everywhere they can. I don't know how much people will edit metadata, but I expect we'll see people enrich data with links and subjects. They've linked to OCA books in the demo, as well as to buying options. I assume that this service will include links to other available digital versions.

Check out the guided tour at http://demo.openlibrary.org/tour.

Sunday, July 15, 2007

study on copyright term

I came across a posting on Ars Technica about a study by a Cambridge University PhD candidate in economics on the optimal terms for copyright.

According to Rufus Pollack's paper [PDF], the optimal term is 14 years, calculated based on the premise that the optimal term for copyright drops as the costs of producing creative work go down: It continually grows less expensive to produce, reproduce, and distribute works using new digital tools, so the therefore the costs for creative work has dropped and will continue to do so. His work presents a set of equations focused on the length of copyright term based on what empirical data there is to determine the "break even" point for the value of a work in copyright. His proposed optimal term is designed to encourage a balance between the incentive to create new work and the social good that comes from works entering the public domain. It's a very interesting set and assumptions and some even more interesting math. I expect to see this attacked, though, not only for its conclusion, but because there is a lack of empirical data for some aspects of his calculation: the data just isn't available to use. He makes plausible estimates and defensible justifications, but I still expect that will get his work attacked. A shame.

Saturday, July 14, 2007

collection development

This past Thursday and Friday I attended a workshop on basic collection development taught by Peggy Johnson from the University of Minnesota, who literally wrote the book on collection development.

It was a great group, mostly from academic libraries, but also from public libraries and government libraries. I went because I thought I could use some additional grounding in traditional collection development -- print and serials -- which I deal with not at all.

I felt somewhat like the odd person out. Almost none of the other attendees dealt with digital collections other than e-journals. While there was a discussion of institutional repositories and open access, there wasn't really any talk of collection development strategies for IRs. When I raised issues of collecting directly from faculty and born-digital scholarship, there was agreement that this was a developing area of concern, but it didn't seem to be relevant yet for most of the attendees.

It was in no way a bad workshop. On the contrary -- she covered a lot of ground, and there were many lively group activities and discussions. I recommend it for folks who want to learn the basics. I discovered that I already knew a lot more about what was covered than I expected to. My experiences with collection reviews at museums and archives where I've worked in the past, my time on the Collections Group at UVA, and my work the digital collections had served me well. It should not be a surprise that the core activities -- analysis and assessment and outreach -- really are very much the same for digital collections as they are for "traditional" collections.

Friday, July 06, 2007

Google exposes plain text

An interesting post from the "Inside Google Book Search" blog:

http://booksearch.blogspot.com/2007/07/greater-access-to-public-domain-works.html

For the works that Google deems to be in the public domain, they are exposing the full text "text layer' of the work. This is not the pages images, but what one assumes is the output of their OCR process. The plain text is organized into pages and retains punctuation and line breaks. You can toggle between the plain text and page image view.

This is a huge breakthrough in terms of accessibility -- the principle reason that they cite for the new feature. What is slightly disappointing is that you cannot download the plan text version to use offline reader tools -- you have to read online. That could be burdensome for some.

It would also be nice someday if the downloaded PDFs included the plain text behind the scenes to support searching within the PDF.

Tuesday, July 03, 2007

honors from Computerworld

I am thrilled to see that the Getty Trust has won a 2007 Achievement Award in Media, Arts, and Entertainment from Computerworld for the Getty Vocabularies. I'm really struck that the nominating company was Unisys. Quoting:

In addition to being essential resources for the documentation of art, architecture, and material culture, the Getty vocabularies are also invaluable lookup tools and knowledge bases, and powerful searching assistants that increase both precision and recall in online queries.

I worked at the Getty Research Institute when the vocabularies were first being built and released -- we were cataloging and creating controlled vocabulary from our own experiences that we submitted to the project. When I worked at the Historic New Orleans Collection, we submitted a a lengthy list of Mardi Gras related terms and documentation. Every organization that I have worked at for the past 20 years uses the vocabularies. Talk about impact! The award is well-deserved.

Also on the awardee's list is Laura Campbell, Associate Librarian for Strategic Initiatives at the Library of Congress. Quoting:

... the EMC Information Leadership Award recognizes individuals whose personal leadership has made a critical contribution to the effective use of information technology throughout the world.

She had key roles in developing their digitizing initiatives and "American Memory" and NDIIPP. Another well-deserved award.

http://www.cwhonors.org/archives/2007/index.htm

Syndetic Solutions ICE

Librarian in Black pointed to the new Syndetic Solutions release of its ICE product.

We have a subscription to Syndetic through SirsiDynix, so book covers, tables of contents, and reviews appear in records displays in iLink. The new product claims to support direct searching of the Syndetic content in addition to the MARC content. Previously it was just part of the display, retrieved from a separate source at the time of the search. This could be very promising.

The more exciting tangential selling point is that this product is available directly to the Library and works with a discovery product separate from the ILS. It only works with AquaBrowser, but that's a step in the right direction. A widely used value-added discovery content service is now unbundled from the ILS environment.

Digital Eccentric