Friday, September 28, 2007

cool securing tool for kids on the internet

I just saw a commercial for the Fisher Price Easy Link Internet Launch Pad, targeted for children three and up. The Easy Link -- a specialized USB peripheral with its own software -- allows children to explore sites dedicated to characters when they plug a figure of that character, like Elmo, into the Launch Pad. The kids are offered links to read and games to play, and nothing more -- there is no access to the Internet or to the hard drive and any of its applications without a password. It's $30, which is a reasonable price to introduce kids to working with a computer while limited their access to anything they can damage or can harm them.

And yes, kids that young do use computers. I remember watching my cousin's daughter playing computer games when she was four. But of course, both her parents are software engineers.

follow-up to virtual strike

Read the report and see the screen shots from the virtual strike in Second Life.

award for digital preservation tool

A press release was circulated via email announcing that DROID, a tool from The National Archives in London, had won the 2007 Digital Preservation Award.

From the press release:

An innovative tool to analyse and identify computer file formats has won the 2007 Digital Preservation Award.

DROID, developed by The National Archives in London, can examine any mystery file and identify its format. The tool works by gathering clues from the internal 'signatures' hidden inside every computer file, as well as more familiar elements such as the filename extension (.jpg, for example), to generate a highly accurate 'guess' about the software that will be needed to read the file.

Identifying file formats is a thorny issue for archivists. Organisations such as the National Archives have an ever-increasing volume of electronic records in their custody, many of which will be crucial for future historians to understand 21st-century Britain. But with rapidly changing technology and an unpredictable hardware base, preserving files is only half of the challenge. There is no guarantee that today's files will be readable or even recognisable using the software of the future.

Now, by using DROID and its big brother, the unique file format database known as PRONOM, experts at the National Archives are well on their way to cracking the problem. Once DROID has labelled a mystery file, PRONOM's extensive catalogue of software tools can advise curators on how best to preserve the file in a readable format. The database includes crucial information on software and hardware lifecycles, helping to avoid the obsolescence problem. And it will alert users if the program needed to read a file is no longer supported by manufacturers.

PRONOM's system of identifiers has been adopted by the UK government and is the only nationally-recognised standard in its field.

The judges chose The National Archives from a strong shortlist of five contenders, whittled down from the original list of thirteen. The prestigious award was presented in a special ceremony at The British Museum on 27 September 2007 as part of the 2007 Conservation Awards, sponsored by Sir Paul McCartney.

Ronald Milne, Chair of the Board of Directors of the Digital Preservation Coalition, which sponsors the award, said: "The National Archives fully deserves the recognition that accompanies this award."

Thursday, September 27, 2007


I spent a little time this afternoon reading up on the newly released digital preservation tool Xena.

You can point it at a directory of diverse file types and it will convert the files into normalized open formats. The list of supported formats and the conversion outcomes is available in the help docs.

This is potentially a really useful workflow tool but there's a lot to examine here. I don't know how scriptable it is. You can write plugins to add in new formats -- I'm not yet sure if you can change conversion decisions and alter the target formats. Why is the target format for pretty much every image format PNG? Could we change that to TIFF or JPEG2000 if we were willing to write the plugin? It runs on Windows and Linux and requires OpenOffice. On Linux, does it require a graphical environment, or can you run it from the command line?

I'm thinking that this could be really useful for an IR, but I'm not yet sure if it will scale for Library-wide preservation or collection repositories.

Monday, September 24, 2007

archives on the web

Technophilia lists Where the Web Archives Are. Here's what they say:

Some of the most intriguing resources on the web are located in archives—compilations of data that in the past, could only be found by making appointments in dusty libraries. Today, I'm going to take you on a quick tour through some of the most fascinating archives on the web.
So where are they? If I am reading the list correctly, they're pretty much not at any academic libraries.

In the "Government" section, there is the National Archives and the Library of Congress. There is the Internet Archive, which is indeed a library. There's the Rockfeller Archive. There's NASA. There's David Rumsey, possibly the best private map archive in the world. There is the British Library.

Otherwise, it's Calvin and Hobbes, Smithsonian Magazine, the Smoking Gun, and The Balcony Archives of movie reviews.

I don't want to knock their list -- it's an interesting list full of great collections of very worthwhile content. But where are all the other myriad Library special collections and archives on this list? Is it that we aren't visible enough? Or perhaps not cool enough compared to PBS's Nova? Where are our extensive online archives on runaway slaves or civil rights or early American literature? Or political cartoons or penny dreadfuls or sheet music? Or puzzles or jazz or the civil war?

I think we have to remember that our target audience is not just our very local community, but the global community, including non-academics. We all need to think a bit more about how to get the word out about what we've made freely available. Being available in a Google search isn't proactive enough. We need to work to get noticed.

Friday, September 21, 2007

not the usual google law suit

As seen at Tech Crunch, a Pennsylvania resident is suing Google for crimes against humanity and is asking the court for $5 billion in damages because his social security number, when turned upside down and scrambled, spells Google. His handwritten filings are on the Justia site.

Tuesday, September 18, 2007

virtual strike

The first virtual strike is taking place soon. Apparently there are labor actions planned by the union representing Italian employees of IBM over pay negotiations -- as one of their strategies they plan to picket the company's campus in Second Life. They're even providing orientation for IBM employees who are new users. I wonder what the corporate reaction will be? The press this action is getting is pretty intensive.

new york times open access

The story of the day seems to be about the NY Times opening up its archives. So far I've seen postings at boing boing, if:book, open access news, o'reilly radar, and teleread.

So why am I bothering to blog this? Because this made me think about something I blogged about some months ago -- Google News Archive Search. One of the things that galled me at the time was how much of what they indexed was behind a pay firewall. Now, the NY Times is opening almost all their content up (save for 1923-1986), making this a more useful service, at least for resources from one newspaper. If only there wasn't so much other for-fee public domain newspaper content controlled through ProQuest Archiver. I still hope for an OpenURL Resolver service so authorized users can get to authorized resources at ProQuest Historical Newspapers instead.

Saturday, September 15, 2007

career meme

Jerry blogged about the results he received from a test at Career Cruising. Since I was sitting at home on a Saturday afternoon, it seemed the thing to do. I dutifully answered the questions and the follow-up questions, and I just about fell off the sofa when I got the results:

1. Anthropologist
2. Video Game Developer
3. Multimedia Developer
4. Scientist
5. Picture Framer
6. Political Aide
7. Computer Animator
8. Interior Designer
9. Business Systems Analyst
10. Website Designer
11. Market Research Analyst
12. Librarian
13. Medical Illustrator
14. Artist
15. Real Estate Appraiser
16. Computer Programmer
17. Set Designer
18. Cartographer
19. Animator
20. Costume Designer
21. Cartoonist / Comic Illustrator
22. Illustrator
23. Mathematician
24. GIS Specialist
25. Epidemiologist
26. Dental Assistant
27. Statistician
28. Economist
29. Graphic Designer
30. Desktop Publisher
31. Historian
32. Archivist
33. Curator
34. Web Developer
35. Public Policy Analyst
36. Esthetician
37. Hairstylist
38. Technical Writer
39. Makeup Artist
40. Webmaster

I have no idea how their questions led their system to tell me that I should be an anthropologist. Apparently I did select the correct course of study in college and graduate school! Archivist, curator, web designer and developer, and tech writer are all familiar activities to me. I did my share of amateur theatrical work years ago. This was uncannily on target.

But where did dental assistant come from? Or esthetician? Picture framer? Political aide? I just cannot imagine any of those are for me.

Friday, September 14, 2007


I don't think that there is much that I can add to this excellent review of oSkope at if:book. I spent some time at oSkope exploring their flickr search. The mouseover shows the title and date for the image, plus whose collection it came from. If you click on the image, a popup appears that includes the above plus the tags and a zoomable thumbnail. There's a slider at the right that changes the number of images that appear in the grid -- from 4 to 500. The grid, stack, pile, and list views are great --- but I'm not sure what the axes are for the graph view.

I like the drill-down navigation through the ebay categories. As noted in the if:book entry, it didn't seem to be working and kept returning no items.

The oSkope User Agreement (pdf) accompanies the language "Use of this website consitutes [sic] acceptance of the oSkope User Agreement and Privacy Policy. Please read these agreements carefully." At six pages it is thorough. There's also a four page privacy policy (pdf).

Monday, September 10, 2007


In January I saw a presentation by Julie Allinson at Open Repositories on the UKOLN Repository Deposit Service work. Phil Barker of CETIS has a blog entry on a number of repository standards topics, one of which is SWORD (Simple Web-service Offering Repository Deposit), the project which takes forward the work I saw presented. The goal is to take their deposit protocol and implement it as a lightweight web-service using a prototype "smart deposit" tool for four repository software platforms: EPrints, DSpace, Fedora and IntraLibrary. They're taking advantage of the ATOM Publishing Protocol and extending it, which seems like a smart direction to me. I'm looking forward to seeing more of this.

Sunday, September 09, 2007

UNESCO open source repository report

UNESCO has issued an very interesting report -- Towards an Open Source Archival Repository and Preservation System -- that defines the requirements for a digital archival and preservation system and describes a set of open source software which can be used to implement it. It focuses on DSpace, Fedora, and Greenstone, principally comparing the three systems in their support for OAIS. The report uses as the basis for its comparison a single use case -- the management and preservation of images.

I think it's a very fair report, not deeply technical, but an overview of the capabilities of the tools. Fedora is well-reviewed, with some shortcomings mentioned -- it takes a high level of programming expertise to contribute to the core development (true), the administrative reporting tools could stand some improvement (I could use granular use statistics), and a lack of built-in automated preservation metadata extraction and file format validation. On those last two points, the Fedora architecture very easily supports the integration of locally developed automated processes in metadata extraction and format validation into object preparation. That's what we have done. That Fedora has supported checksum checking support since version 2.2 is a huge step for file preservation.

Thursday, September 06, 2007

google book search features

Google Book Search has introduced a My Library feature, where you can identify volumes in GBS and books that you own and associate them with your Google account. I also ready had an account that I use with blogger and Google Analytics, so there was nothing to set up. I can search and easily click on an "add to my library" link. I can assign a star rating, add a review, and add labels. I don't seem to be able to see a list of labels that I've assigned. I'd like to be able to create individual sets, but there doesn't seem to be a way to do that. The export is a lightweight xml document that's lacking publication data like date, or publisher. You automatically have an RSS feed. It's interesting, but I'm not sure what this gives me over LibraryThing other than URLs for the books in GBS.

The more interesting service is the ability to highlight and quote from a text in GBS. It only works with full view texts -- the tool is not available for any other view. I searched for the term I was interested in and went through 20 screens of results without finding a book that I could try the tool with. I had to resort to an advanced search for titles between 1900 and 1923 to try it. That's an interesting indicator of just how much is in GBS that's post 1923 -- none of the first 200 results in my search were in the domain and full view.

I found a text I wanted to quote and used the tool to draw a box around the text. Drawing the box is a tad tricky -- my first two tries I didn't get the box large enough to get the first line of what I wanted to quote. I was given the option to create an image of the text block or to grab the text. I could add it to my Google Notebook or send it to blogger (because I have an account). You are also presented with a URL that you can use to embed the note in a web page. The quote includes a link to the text in GBS.

This seems really useful to me. In our paradigm at UVA we talk about how it's not enough to digitize something -- you have to be able to use it. This is the first tool I've seen from GBS where it makes its texts into something that you can really take advantage of in a networked environment.

amazon kindle

There was an article in the New York Times yesterday on ebooks that briefly mentioned two upcoming business models:

In October, the online retailer will unveil the Kindle, an electronic book reader that has been the subject of industry speculation for a year, according to several people who have tried the device and are familiar with Amazon’s plans. The Kindle will be priced at $400 to $500 and will wirelessly connect to an e-book store on Amazon’s site.

That is a significant advance over older e-book devices, which must be connected to a computer to download books or articles.

Also this fall, Google plans to start charging users for full online access to the digital copies of some books in its database, according to people with knowledge of its plans. Publishers will set the prices for their own books and share the revenue with Google. So far, Google has made only limited excerpts of copyrighted books available to its users.

The Google announcement is, I think, a fair one -- right now they limit viewing to copyrighted books to a snippet view. If a work is still clearly in copyright and the rights owner wants to release that book for full access, they should be able to charge for that access. It's their right. Of course I'd like to see more publishers make e-versions of their title available freely ...

The Amazon news gives me pause, not knowing all the details yet. You access the files wirelessly -- do you read them via a live connection from their servers, or is the file downloaded to the device? I understand why some think it's a plus to not require a full-fledged computer to get access to a book, but it potentially seems like a really limited version of access. The ebook files will be Mobipocket format and the Kindle device seems to use a proprietary wireless system to grab the files (known through their FCC filing), so the files likely won't be available to other devices. They are not using the Adobe format for their files; it's not clear if the Kindle will support reading of Abobe ebooks from other sources or if you can only read Amazon files. Can you get the files off the device or back it up? If you can get the files off the device, will they work with the desktop version of Mobipocket? There have also been complaints about Mobipocket DRM.

This is all speculation given the lack of details. TeleRead has some speculation of their own. I look forward to hearing more about the product and the service.

Wednesday, September 05, 2007

fair use decision

Today the Tenth Circuit court ruled unanimously in favor of Larry Lessig, et al, in Golan v. Gonzales, a case about the scope of fair use. The court has acknowledged that First Amendment freedoms must be considered when copyright law is made.

The government had argued in this case, and in related cases, that the only First Amendment review of a copyright act possible was if Congress changed either fair use or erased the idea/expression dichotomy. We, by contrast, have argued consistently that in addition to those two, Eldred requires First Amendment review when Congress changes the "traditional contours of copyright protection." In Golan, the issue is a statute that removes work from the public domain.

Monday, September 03, 2007

internet archive and nasa

I missed this announcement last week (even though Peter Suber blogged it) -- NASA and Internet Archive Team to Digitize Space Imagery:

NASA and Internet Archive of San Francisco are partnering to scan, archive and manage the agency's vast collection of photographs, historic film and video. The imagery will be available through the Internet and free to the public, historians, scholars, students and researchers.

Currently, NASA has more than 20 major imagery collections online. With this partnership, those collections will be made available through a single, searchable "one-stop-shop" archive of NASA imagery.


NASA selected Internet Archive, a nonprofit organization, as a partner for digitizing and distributing agency imagery through a competitive process. The two organizations are teaming through a non-exclusive Space Act agreement to help NASA consolidate and digitize its imagery archives at no cost to the agency.


Under the terms of this five-year agreement, Internet Archive will digitize, host and manage still, moving and computer-generated imagery produced by NASA.


In addition, Internet Archive will work with NASA to create a system through which new imagery will be captured, catalogued and included in the online archive automatically. To open this wealth of knowledge to people worldwide, Internet Archive will provide free public access to the online imagery, including downloads and search tools....

From an AP article on Wired News:

Kahle said the archive won't be able to digitize everything NASA has ever produced but will try to capture the images of broadest interest to historians, scholars, students, filmmakers and space enthusiasts.

Kahle said the images already in digital form represent the minority of NASA's collections, and they are scattered among some 3,000 Web sites operated by the space agency. He said those sites would continue to exist; the archive would keep copies on its own servers to provide a single, free site to augment the NASA sites.


The Internet Archive is bearing all of the costs, and Kahle said fundraising has just started. The five-year agreement is non-exclusive, meaning NASA is free to make similar deals with others to further digitize its collections.

What's particularly exciting is that this is both an aggregation and a digitization project -- widespread materials will be brought together for easier discovery, get enriched metadata, and important materials will be selected and digitized to add to the corpus.