Digital Eccentric: standards

Showing posts with label standards. Show all posts

Saturday, June 27, 2009

new BIL on SourceForge and update to BagIt spec

This week saw a couple of events around the BagIt specification and tools.

A revision of the BagIt specification went out this week. You will note that it is still 0.96 -- the revisions were only in language to clarify some questions that had been received. There are some discussions going on about 0.97 - join the Digital Curation Google group. I'd like to see some more activity there!

Version 3.0 of BIL, the BagIt Library for Java, was released on SourceForge this week. It's available as binary and source code.

Plus, there was the BagIt video ...

Sunday, April 12, 2009

museum data exchange software

OCLC, funded by the Mellon Foundation and working with the software company Cognitive Applications, Inc, has released COBOAT and OAICat Museum to support data interchange between museums. This work is happening under the auspices of their Museum Data Exchange Project.

So what, many people will say? It should already be easy to share museums data, right?

Not so much.

The museum collection management system arena has some major vendors (Gallery Systems, Willoughby, Minisis, Cuadra, etc) and some smaller vendors (Re:discovery, PastPerfect, etc.), and countless (and I really mean countless) home-grown systems running on FileMaker, Access, and MS-SQL. I know, because I spent many years working for museums and I was on the board of the Museum Computer Network, a group that dilligently worked on many interchange initiatives. I worked with software from 3 vendors and managed a FileMaker-based system. Getting data in was easy. Getting data out was often hard. Participation in data aggregation projects took a lot of effort. And most small- or medium-sized museums (and there are many, many more of them than large museums) have little or no technology staff to enable data sharing. And there is no common data schema in the community.

The museum community itself has sometimes slowed progress. When discussion of relevant library community standards were mentioned, some said "We're nothing like libaries! Our collections are unique! Their standards are not for us!" That attitude seems to have adapted in the last 10 years.

I am glad to see something like this going forward. A fee-free tool that can help museums extract data from black-box vendor systems and enable sharing? Bring it on.

Thursday, February 05, 2009

DCC paper on interoperability

The Digital Curation Centre has released a short briefing paper on interoperability. Its a good, brief primer on the basic issues.

JHOVE2 requirements available

The latest version of the JHOVE2 Functional Requirements have been posted. I'm still interested in what isn't documented yet, e.g., the final list of formats that will be supported.

Friday, October 03, 2008

Federal Agencies Digitization Guidelines Initiative

The Federal Agencies Digitization Guidelines Initiative site went live on September 30, 2008. The initiative represents a collaborative effort between U.S. government agencies to establish a common set of guidelines for digitizing historical materials. Participants include the Defense Visual Information Directorate, the Library of Congress, the National Agricultural Library, the National Archives and Records Administration, the National Gallery of Art, the National Library of Medicine, the National Technical Information Service, the National Transportation Library, the Smithsonian Institution, the U.S. Geological Survey, the U.S. Government Printing Office, and The Voice of America.

The Still Image Working Group is focusing its efforts on books, manuscripts, maps, and photographic prints and negatives. There are draft "Digital Imaging Framework" and "TIFF Image Metadata" documents available. The Audio-Visual Working Group effort will cover sound and video recordings and will consider the inclusion of motion picture film as the project proceeds. That group is still at the document drafting stage.

Friday, July 18, 2008

Names Project

I'm intrigued by the Names Project to identify requirements and develop a prototype service that will reliably and uniquely identify individuals and institutions for institutional and subject repositories in the UK. They report anecdotally that more than 75% of authors represented in IRs aren't in LCNAF. The goal is a straightforward and laudable one: a centralized name authority module that will plug into existing and future repository software and provide autocompletion of author names for depositors of materials and for searchers of the systems.

I found this paper by Amanda Hill to be the best introduction to the project. The project has just issued a software specification and I plan to watch its progress.

Wednesday, July 02, 2008

PDF now an ISO standard

PDF is now officially an ISO standard, finally joining PDF/A (ISO 19005-1) . More details are offered in the press release from the ISO.

The Portable Document Format (PDF), undeniably one of the most commonly used formats for electronic documents, is now accessible as an ISO International Standard - ISO 32000-1. This move follows a decision by Adobe Systems Incorporated, original developer and copyright owner of the format, to relinquish control to ISO, who is now in charge of publishing the specifications for the current version (1.7) and for updating and developing future versions.

You can read the description of the standard.

Friday, June 06, 2008

BagIt

A press release went out this week on digitalpreservation.gov about the BagIt format specification. BagIt is a lightweight specification for the description of data packages meant for transfer between institutions. The need for such a standard was initially identified in working with NDIIPP partners such as CDL (John Kunze of CDL played a major role in the format development and testing and is one of the principal authors). Other partners have expressed interest, and we are moving forward with a prototype submission web app that will take advantage of BagIt.

My colleague Ed Summers has posted a fantastic overview of the specification to which I can add nothing except reinforcing the kudos deserved by everyone involved.

There is an official Internet-Draft and comments are welcome.

Friday, May 09, 2008

application profile for images

In the most recent issue of Ariadne there's an interesting article entitled "Towards an Application Profile for Images" by Mick Eadie. JISC has just completed the first phase of some work drafting an application profile for images aimed primarily at the repository community.

I was happy to see recognition of the complexity of digital image objects and the need to track relationships between images, the sources of those images, the content depicted in the images, etc. They looked at FRBR and at the VRA Core, and ended up creating a conceptual model with the digital file at the center that uses the language of FRBR.

I find myself disagreeing with some of their decisions:

In our model, we have renamed the FRBR Work entity as ‘Image’ for reasons of clarity, mainly to avoid confusion between notions of Work as described traditionally in image cataloguing in the cultural sector (i.e. the physical thing) and abstract Work as described in FRBR. As noted above, image as defined in the IAP is a digital image, in line with our notion of end-users searching repositories for digital images of something. Therefore our conceptual model - while still using the language of FRBR and using the areas of SWAP that have applicability across the text and image domains - places the digital image at its centre.

Architecturally, the JISC model makes sense. You have an image, which is a depiction of a site or a work of art, with manifestations as one or more formats of files. That's what you have to physically manage in a repository.

From a discovery point of view, thought, I'm not fully convinced. Having modeled image objects in a cataloging environment and in a repository architecture and discovery environment, I think that FRBR and the VRA Core have it exactly right as to what users are looking for -- images of something. Researchers are looking for images of Chartres Cathedral or Jeff Koons' "Rabbit" or Leonardo da Vinci's "Vetruvian Man." They are looking for images of those Works. When we designed the content models for images in the UVA Digital Collections Repository, the top level is that sense of work. The top level is a work object, which has child expressions for individual views of that work, which have child manifestations that are the actual media files. We found this relatively easy to manage and to build a discovery interface around. It was also the model already used in the cataloging, so it made for an easier translation from that system to the repository.

I need to review the Images Application Profile (IAP) work in more detail. I know this is aimed more at use for IRs and not for image collections, but I think such a profile can only become ubiquitous if it covers both repository scenarios. Their proposed metadata is well-positioned with its use of MIX elements to manage the image files, but I see less than I'd like to see in descriptive elements that support discovery. For example, I don't see a descriptive content date element. I cannot think of a research use case that wouldn't include searching for, say, images of 18th-century French sculpture. I think they should give more thought to incorporating more from VRA Core and CDWA.

Friday, April 04, 2008

PREMIS 2.0

The PREMIS Editorial Committee has released PREMIS Data Dictionary for Preservation Metadata, version 2.0, a revision of the May 2005 report. A draft XML schema -- still undergoing a month of review before its final release -- is also available.

Audiovisual Research Collections and Their Preservation

TAPE (Training for Audiovisual Preservation in Europe) has published Audiovisual Research Collections and Their Preservation. This report looks at the requirements for access and re-use, focusing on the potential of digitization for creating distributed content-based archives.

Tuesday, March 25, 2008

review of OpenID

My interest in OpenID has recently been piqued, so I am definitely looking forward to the outcome of a JISC OpenID review:

"The primary aim of the project is to produce a report which will allow busy decision-makers to understand OpenID’s security properties well enough, quickly enough, to apply it safely and avoid its potential security pitfalls, based on first establishing by means of a survey a sound understanding of how such decision-makers are likely to proceed in the absence of such guidance. The secondary aims are to develop bridging software that will allow OpenIDs from any source to be used as identities within the production UK (SAML) federation, creating opportunities for early adopters to experiment. We will also demonstrate a library-type service modified to make use of such identities."

Monday, February 18, 2008

MODS tools

We're chatting a lot about MODS at UVA, partly about data sharing via OAI and partly about possibly replacing some local metadata standards.

In the great synchronicity of our community, there were two posts on MODS creation tools today, one from the DIL and one from Peter Binkley. Between the tools mentioned in those posts and the DLF Aquifer MODS profile work, it's getting easier to work with MODS every day.

Friday, February 15, 2008

CrossRef Citation plugin

I'm running a WordPress and CommentPress experiment (very low key right now), and now I must look into the CrossRef Citation plugin.

museum data exchange study

Mellon has funded at great project and OCLC/RLG Programs for a project to prototype data exchange between museums. Here's the press release. I spent many years in the museum community dealing with this, so I am beyond thrilled to see movement in this area. I would like to have seen more vendor systems involved in the pilot, but this is still a very welcome project.

Thursday, February 07, 2008

OpenID gets some major press with some major names

Via ReadWriteWeb, the OpenID Foundation announced this morning that Google, IBM, Microsoft, VeriSign and Yahoo! have taken seats as the organization's first corporate board members. Support is growing.

Saturday, January 19, 2008

more openID support news

After testing OpenIDs as logins to Blogger in a prototypet program in November 2007, Google has become an OpenID provider. Effective immediately, Blogger users can use their blogs URL as an OpenID login, after toggling the option via the draft.blogger.com admin menu. Read the posting on TechCrunch.

Thursday, January 17, 2008

support for OpenID

Yahoo has thrown some support behind OpenID when used on conjunction with Yahoo IDs. ReadWriteWeb has a thorough analysis.

Tuesday, January 15, 2008

LCSH makes the Washington Post

In 2006 there a decision was made at the Library of Congress to do away with the subject headings for Scottish literature, instead suggesting "English literature" headings. There was more than a small kerfuffle over this when it was belatedly reported in the Times of London and the BBC last December.

Today the Washington Post actually took note of this, publishing an article on the reversal of the decision. The Times has also taken note. In fact, it's all over the news this morning. Interesting that a standards terminology issue has become so political.

Thursday, January 10, 2008

Final Version of the LC Working Group on the Future of Bibliographic Control Report Released

The final version of the highly anticipated report from the Library of Congress Working Group on the Future of Bibliographic Control was released today -- "On the Record: Report of The Library of Congress Working Group on the Future of Bibliographic Control." I don't know when I'll have a chance to look at it, but I understand that changes were incorporated from comments received during the review period.