Tuesday, June 24, 2008

copyright renewal records

The Inside Google Book Search blog announced the availability of U.S. copyright renewal records as an XML file. Google created the set by taking advantage of scanned and keyboarded versions created by the Carnegie Mellon Universal Library Project and Project Gutenberg. Google cleaned up the files for improved parsing, and now they're available for download.

Google thinks that this set is "the best and most comprehensive set of renewal records available today." I am not sure what the difference in data and temporal coverage is between these records and the ones from the U.S. Copyright Office copyright registration database made available through the efforts of DLF and Public.Resource.Org. It is useful that Google has put these out as parsable XML.

1 comment:

inkdroid.org said...

innarestin' it looks like the Google data comes from Project Gutenberg, whereas the public.resource data comes (presuming O'Reilly and Malamud purchased it) directly from LC?

It's kinda neat to see Orwant and Jarkko (old school Perl and CPAN hackers) are behind the Google effort.