Thursday, February 22, 2007

success is a double-edged sword

Since we launched our repository its success has revealed some server issues. We discovered previously undiscovered capacity issues that three very patient and experienced programmers and sys admins have been working very hard to troubleshoot. To allay fears, no, we had no problems with Fedora. But Fedora kept thinking that Tomcat/Cocoon wasn't responding, so Fedora would decline to complete its disseminations. The most puzzling thing was that our image delivery worked just fine, but our texts would fail. Actually, they'd work for a while and then fail. The texts are much larger and have much more complex disseminations and rendering, so we knew it was a capacity issue of some sort. The much increased number of such complex renderings was causing something somewhere to intermittently give up the ghoast.

We ended up doing a number of things: adding more retrys when requesting objects found through text search results. Moving Cocoon and Tomcat onto a newer box with upgraded versions and a lot more memory to allocate. Upping timeout limits in Tomcat and Apache. We found a log that kept filling up. There was a Cocoon STX bug that we needed to take into account in one text disseminator. We have one remaining mystery issue -- some Apache connections that aren't being released. It may be from Cocoon errors that aren't being properly identified as such and going into a wait state.

It's interesting what we never found in 2 years that we found in three weeks when more people started using the service.

Friday, February 02, 2007

unveiling of our repository

It seems like I've been working towards the unveiling of our Digital Collections Repository forever. participating in architecture planning. Collecting functional specifications. Coordinating production standards. Watching workflows come together. Watching teams coalesce. Implementation. Meeting with faculty for testing and feedback. Working with Library staff on testing and feedback. More implementation. More testing.

Finally, after 2 years of alpha and beta versions, we unveiled the Digital Collections Repository yesterday. We had been calling it a launch, but after two years it's really an unveiling.

There were a couple of glitches. High traffic caused issues with text rendering, seemingly due to server timeouts. Some updates hadn't gotten added to the index so a couple of faculty couldn't find some specific images. Expectations exceed some of our search capabilities (I can't index what I don't have in the metadata). But it's out there. We can rest on our laurels for a few days, then start planning the next release and the production and delivery of additional formats. And likely an entirely new indexing infrastructure.

Today, I got a message from a faculty member who I have never met. The selector for her department had emailed the announcement. She liked what she saw and wanted to know how to get her image collection selected and added. I am thrilled.

http://lib.virginia.edu/digital/collections/

Open Repositories 07

I traveled to San Antonio on January 23 to attend Open Repositories 07. I actually attempted to travel on January 22, but was stopped by severe fog. I almost didn't get there on the 23rd (no planes coming in the day before translates to none that can leave), and my bag didn't get there until hours after I did. But that's a lengthy entry for elsewhere.

Because of my travel woes, I missed the entire first day of Fedora sessions, including my own -- Sandy got someone to switch with me, so all was well on that front.

Wednesday morning I gave my Fedora best practices talk, focusing on the process for the development of content models. Wednesday afternoon I gave my talk on UVa's principles of digital curation. I was pleasantly surprised at the number of people who were really interested in what I had to say, requested copies of the talk, and/or asked me to give the talk at their institution. It's a framing of our goals and activities in the digital curation realm, and seems to have struck a nerve with many as a good approach. I'm in the process of expanding this material into a chapter for a book.

Not surprisingly, I'm a big fan of James Hilton. I recommend Peter Murray's synopsis on his blog, as he;s already said everything that I could say.

I hope that presentations are going to be posted, because there are a number that I recommend. Kaare Christiansen's talk on object validation strategies at the State and University Library of Denmark presented some really promising workflow tools. Atsuko Takano's talk on the CURATOR institution repository at Chiba University discussed some interesting categories of data that they're collecting, including overlay journals, e-science, and output from alumni. MacKenzie Smith's talk on PLEDGE presented interesting experiments in policy enforcement in a grid environment. Joan Smith's talk on mod-OAI presented some interesting experiments in enabling web sites to better describe themselves for preservation purposes. Christiaan Kortekaas's talk on the Fez project is increasingly relevant to me, as I know we need to set up a self-deposit environment. Carl Lagoze's talk on the OAI Object Re-Use & Exchange (ORE) initiative clarified many issues for me. Julie Allinson's talk on the Eprints Application Profile presented an interesting FRBR model for eprint representation.

We had an exciting content model working group meeting where we appear to have become a formal-ish working group. One set of us will be working on documenting practices, content models, and disseminators. Another set of us will work on formal representations of content models in an architecture. I'm looking forward to working on the former and seeing the output of the latter.