Digital Eccentric: July 2008

Wednesday, July 30, 2008

Hooray -- Fedora 3.0 released

Hooray -- the highly anticipated (at least by me) formal release of Fedora 3.0 is available.

Excerpted from the press releas e:

Fedora 3.0 features the Content Model Architecture (CMA), an integrated structure for persisting and delivering the essential characteristics of digital objects in Fedora. The software is available at
<http://www.fedora-commons.org/> and at <http://sourceforge.net/projects/fedora-commons>. The Fedora CMA plays a central role in the Fedora architecture, in many ways forms the over-arching conceptual framework for future development of Fedora Repositories. Fedora 3.0 features include:
Content Model Architecture - Provides a model-driven approach for persisting and delivering the essential characteristics of digital content in Fedora
Fedora REST API - A new API that exposes a subset of the Access and Management API using a RESTful Web interface contributed by MediaShelf
Mulgara Support - Fedora supports the Mulgara 2.0 Semantic Triplestore replacing Kowari -Migration Utility - Provides an update utility to convert existing collections for Content Model Architecture compatibility
Relational Index Simplification - The Fedora schema was simplified making changes easier without having to reload the database and significantly increasing scalability
Dynamic Behaviors - Objects may be added or removed dynamically from the system moving system checks into run-time errors
Error Reporting - Provides improved run-time error details
Multiple Owner as a CSV String - Enables using a CSV string as ownerID and in XACML policies
Java 6 Compatibility - Fedora may be optionally compiled using Java 6 while retaining support for Java Enterprise Edition 1.5 deployments
Relationships API - API-M has been extended to enable adding, removing, and discovering RDF relations between Fedora objects
Revised Fedora Object XML Schemas - The new schemas are simpler, supporting the CMA and removing Disseminators
Atom Support - Fedora objects can now be imported and exported in the Atom format
Messaging Support - Integrates JMS messaging for sending notification of important events
Validation Framework - Provides system operators a way to validate all or part of their repository, based on content models
3.0-Compatible Service Releases - New versions of the OAI Provider and GSearch services are compatible with Fedora 3.0. The GSearch release also enables messaging support for GSearch, which allows for more robust and seamless integration with the Fedora repository.

I have been waiting for the CMS for some time -- this update to the architecture will greatly improve the flexibility of a Fedora implementation by removing the tight bindings between objects and disseminators and allowing for easier disseminator updating. The validation support is also key if one is interested in working with workflow engine to automatically process tasks and validate production outcomes. I am intrigued by the Atom support -- Dan commented on BagIt/SWORD as a possible repository SIP in one of our discussions at RepoCamp. This could become a very real experiment.

Tuesday, July 29, 2008

Cuil

I finally got through to Cuil, the plocaimed Google threat, this afternoon. After reading Siva's post, I decided to try searching my own name in quotation marks.

I know that there are other Leslie Johnstons. There's sales and marketing consultant, a renowned Scottish footballer from the 1940s-50s, a cancer researcher, etc. We all came up in the first 8 pages that I reviewed. Cuil obviously pulls images that it finds on various sites and associated them with results. In some cases, it correctly included my image with a link that I was associated with. In some cases my photo showed up associated with results for the other Leslie Johnstons. On the very first page a results for an article of mine in D-Lib was accompanied by a photo of my friend and colleague Sarah Shreeves who had an article (and hence a bio with a photo) in the same issue. On another page a result for a different D-Lib article was accompanied by a photo of Chris Awre. Other links for presentation I gave are accompanied by a portrait of Thomas Jefferson (I assume it keyed in on "University of Virginia"). For some links where there were no images there are little images of top nav bars from the page the result points to. In one case there are results from a usability site at UVA with my name on it, but the accompanying image is something that looks like a nav element in Cyrillic, which is definitely NOT on that usability site. Very weird and random.

When I tried the categories -- both of which were for Scottish footballers -- I still got lots of results of mine. I think I can say that none of my presentations or writing on digital library or museum activities ever mentioned Scottish or even American football.

Cuil needs some work. How will it learn?

Wordle

Everyone else has been playing with Wordle for weeks now. I kept thinking about what I might pipe through it, and finally landed on the UVA Repository Case Study that I submitted to Open Repositories 08. I like what it produced:

on NYRB article about Google Books

Jean-Claude Guédon and Boudewijn Walraven submitted letters to the New York Review of Books which have been published as "Who Will Digitize the World's Books?" They are commenting on Robert Darnton's "The Library in the New Age", and he responds to their letters.

Fedora and DSpace collaboration

A press release hit the streets today about a formal collaboration between the Fedora Commons and the DSpace Federation. Excerpt from the press release:

The decision to collaborate came out of meetings held this spring where members of DSpace and Fedora Commons communities discussed multiple dimensions of cooperation and collaboration between the two organizations. Ideas included leveraging the power and reach of open source knowledge communities by using the same services and standards in the future. The organizations will also explore opportunities to provide new capabilities for accessing and preserving digital content, developing common web services, and enabling interoperability across repositories.

In the spirit of advancing open source software, Fedora Commons and DSpace will look at ways to leverage and incubate ideas, community and culture to:

1. Provide the best technology and services to open source repository framework communities.

2. Evaluate and synchronize, where possible, both organizations'technology roadmaps to enable convergence and interoperability of key architectural components.

3. Demonstrate how the DSpace and Fedora open source repository frameworks offer a unique value proposition compared to proprietary solutions.

The announcement came on the heels of an event sponsored by the Joint Information Systems Committee's (JISC) Common Repository Interface Group (CRIG) held at the Library of Congress. The event, known as "RepoCamp," was a forum where developers gathered to discuss innovative approaches to improving interoperability and web-orientation for digital repositories. Sandy Payette, Executive Director of Fedora Commons, and Michele Kimpton, Executive Director of the DSpace Foundation, reiterated their commitment to collaboration and encouraged input and participation from both communities as work gets underway.

The full press release is available. Theres a great photo of Sandy and Michelle in a ceremonial handshake at LoC. Sandy and Michelle led a brief discussion about this last Friday at RepoCamp, and was exciting to watch this initiative launch.

Monday, July 28, 2008

Blow Up for Flickr

Through a post on ReadWriteWeb, I found Blow Up.

Blow Up uses the public Flickr API tp create a slide show presentation that allows your images to be seen in fullscreen mode while still showing thumbnails of the other images in the slideshow and navigation to your sets, while maintaining your image quality (or, in my case, showing me that, when blown up, a lot of my pictures are not quite focussed). It's a very clean UI.

The service is free and doesn't require anything other than your Flickr username to get started (I do wonder if they're storing those). because you are not logging in, Blow Up only shows images that you have set to public viewing. Other functionality include being able to download the Blow Up app to display your Flickr images on your other websites.

RepoCamp

Last Friday I spent all day at RepoCamp. There were at least 40 participants from I don't know how many institutions! Major kudos to David Flanders from the JISC Common Repository Interface Group (CRIG) who did a fabulous job organizing the event, and to Ed Summers who facilitated the LoC side.

There was a lot of great discussions around SWORD and OAI-ORE. I was happy to have the opportunity to talk about BagIt with a group who hadn't encountered it yet, and we had some really interesting discussions. Talking through use cases beyond our initial simple use case -- files from Institution A are transferred to Institution B and stored for preservation with no active access -- there is an obvious need for BagIt profiles that specify what is contained in a Bag and how it's organized for other uses -- like potentially as a SIP for ingest into a repository. Folks were also really interested in the idea of "Holey Bags" where the manifest is a list of URIs for retrieving files. Ideas were batted around about crawls that start out with a minimal manifest of URIs, capture those files, generate checksums, follow links from those files to capture more files and checksums, ending up with a Bag generated on-the-fly from that crawl so you have captured the files and record of the URIs where the files were found. A really interesting suggestion was the use of an OAI-ORE Resource Map to instantiate such a capture. Or for that matter, serve as the fetch file. Bags of course can simply be files on disk, but when it's a Bag of web resources you might want more structure than just a list of locations the files came from. After listening to the discussions I'm convinced that the work we're doing (I should say Ed is doing) with a web app for a Bag deposit service that uses SWORD is going in the right direction. I think were developing some real traction with BagIt.

They video recorded all the elevator pitches and the reports to the whole group. I don't know if, where, or when they'll be available. I am not a fan of seeing myself on video.

It was nice to see some of the Fedora team -- Sandy Payette, Dan Davis, Eddie Shin -- and some UVA colleagues. Its only been 3 or so months but it feels like I left so long ago. I was pleased to meet Ben O'Steen from Oxford (we were following his Fedora IR work when I was at UVA) but I didn't get the chance to really sit down and talk with him. I had so many other interesting conversations that I need to follow up on ...

Tuesday, July 22, 2008

what is a repository?

Yesterday a colleague was chatting with me about what make up a repository. Have we been overthinking what is needed? Can we simplify the tools we use? Recombine lightweight tools in a new way?

This was very timely because I'd seen a posting that JISC's Information Environment team is experimenting with IdeaScale to have a discussion about defining repositories to feed into JISC work on repository architecture.

First -- about IdeaScale:

It begins with an idea posted to your IdeaScale community by a user. Each idea can be expanded through comments by the community. The ultimate measure of an idea is determined by a voting system. Any idea can be voted to the top or buried back down to the bottom. It combines the "wisdom of the crowds" concept with Web 2.0 models like Digg.

I think it's interesting that JISC is trying this approach -- have discussants set out a series of statements about repositories, allow comments, and let members of the community sign up to vote +1 or -1 on the positions.

I have a love/hate relationship with the word "repository." It's next to impossible to define or describe, but I haven't been able to come up with anything better. I'm not sure this activity will produce any solid definitions, but it is generating a very interesting public discussion.

Friday, July 18, 2008

Names Project

I'm intrigued by the Names Project to identify requirements and develop a prototype service that will reliably and uniquely identify individuals and institutions for institutional and subject repositories in the UK. They report anecdotally that more than 75% of authors represented in IRs aren't in LCNAF. The goal is a straightforward and laudable one: a centralized name authority module that will plug into existing and future repository software and provide autocompletion of author names for depositors of materials and for searchers of the systems.

I found this paper by Amanda Hill to be the best introduction to the project. The project has just issued a software specification and I plan to watch its progress.

Thursday, July 17, 2008

fail whale art

I'm not a twitter user (please don't start on me), so I was completely unaware of what the "Fail Whale" is. There's a wonderful post on ReadWriteWeb on the artist behind the graphic and how her work took on a new life through the twitter community.

international copyright law and digitization

In one of those great synchronicities, I've encountered two publications on international copyright law and digitization, both of which are worth reading.

The first is an Information World Review article entitled "Scan and Deliver" about how issues of copyright clearance have affected the British Library's digitization program. (I keep hearing Adam Ant's "Stand and Deliver" in my head)

The second is the International Study on the Impact of Copyright Law on Digital Preservation just released by the Library of Congress. The report is a joint effort of the Library of Congress National Digital Information Infrastructure and Preservation Program, the Joint Information Systems Committee, the Open Access to Knowledge (OAK) Law Project, and the SURFfoundation.

Wednesday, July 16, 2008

ask a person next time

From BoingBoing, a very funny pointer to a Chinese restaurant that relied on an online translation service when they shouldn't have ...

Tuesday, July 15, 2008

ndiipp partners meeting

Last week I attended the three-day 2008 meeting for the partners in the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP). Yesterday a colleague who couldn't attend asked me what stood out for me in the program. I didn't take a lot of notes -- I kept forgetting to because I just wanted to listen -- but I see some patterns in the cryptic, poorly-keyed memo on my Centro. (Note to organizers -- get more wireless connections next time. I didn't bother with my laptop because there was very little chance of getting on the network)

Private LOCKKSS Networks were everywhere. MetaArchive, Arizona State Library and Archives PeDALS, Data-PASS, ETD preservation, and, of course, journal content. It's interesting to see the LOCKSS distributed and self-replicating architecture being used for all types of content.

Distributed and/or replicated storage overall was definitely a trend. iRODS was mentioned in several sessions, I learned more about Dataverse, and I attended a meeting with the FACIT partners.

The packaging and transfer of files between institutions was discussed quite a bit. I was pleased to see the positive reaction to the BagIt package standard that LoC has been working on, which has been put into use with some NDIIPP partners including CDL and Stanford. I was really intrigued with a presentation that Tom Habing did on the ECHO DEPository project. I've seen it presented before, but something really clicked this time when I saw their Hub and Spoke architecture and listened to him talk about packaging between systems and services.

What really stuck with me was something that Micah Altman from Harvard said. He was discussing selection for digital preservation and declared that we need to "select the selectors" in identifying what should be preserved, because we can't save everything. If we identify key researchers and tie preservation to their research, we're assured to capture at least some vital resources. But there are so many disciplines that no one institution can identify what should be preserved, so the corollary need is for many, many institutions to involves themselves in selection and preservation, so there is more preservation coverage for the future. I was glad to hear selection described as a necessary activity.

Wednesday, July 09, 2008

saw a kindle today

The man who sat down next to me on the Metro this morning had a Kindle. While he really just wanted to read, he politely answered a couple of questions and let me hold it. It feels lighter than I expected, and the screen is reasonable clear and high contrast. It's also not as ugly as I thought. Each screen shows about 3 paragraphs of text (better than my Palm), but the entire screen flashes every time to navigate to a new page.

If anyone who has a Kindle is interested, he was reading Ewan McGregor's Long Way Round.

Monday, July 07, 2008

is Google identifying more full text works for GBS?

Barbara Quint at Information Today wonders -- Is Google Book Search Targeting More Books for Public Domain?

Pretty much no U.S. library will make post-1922, probably in-copyright digitized material from their collections available on the open web without varying levels of risk assessment that includes a review of copyright renewal records. Now that Google has developed its own copyright renewal data, will it use that data to identify works that should be in the public domain? And will they make those works available as full-text in Google Book Search? And will they share their research results with the rest of the community so we can free our digitized copies, too?

Wednesday, July 02, 2008

Mbooks Collections

The BLT announced a new feature in MBooks at Michigan -- Collections Pages.

I love that I can browse collections that others have made public -- Perry has a great Gothic Literature collection going (no public domain Castle of Otranto yet?) -- and that you can view the full collection or limit your view to the full-text items. I also love that I can copy items into my own collections to help populate them. I couldn't find almost any MBooks items in Mirlyn to build my first attempt at a collection. Not too many public domain texts on voodoo are available yet...

PDF now an ISO standard

PDF is now officially an ISO standard, finally joining PDF/A (ISO 19005-1) . More details are offered in the press release from the ISO.

The Portable Document Format (PDF), undeniably one of the most commonly used formats for electronic documents, is now accessible as an ISO International Standard - ISO 32000-1. This move follows a decision by Adobe Systems Incorporated, original developer and copyright owner of the format, to relinquish control to ISO, who is now in charge of publishing the specifications for the current version (1.7) and for updating and developing future versions.

You can read the description of the standard.

blogging from excavations

As reported by The Chronicle for Higher Ed, students in Cotsen Institute Archaeology Field Program at UCLA will be blogging from seven of the school's excavation sites in Albania, Canada, Chile, Ecuador, Panama, Peru, and the U.S.

As a graduate alumna of the UCLA Archaeology department, I am thrilled to see this increased visibility for the program, as well as for the practice of archaeology. I know firsthand how challenging it is to explain what is it you actually do in the field...

Digital Eccentric