Friday, October 31, 2008

court rules that hash analysis is a fourth amendment search

The U.S. District Court for the Middle District of Pennsylvania has issued an opinion in the case United States v. Crist that a hash value analysis in a criminal investigation counts as a Fourth Amendment "search." Read a synopsis at ars technica.

JISC Digital Preservation Policies Study

JISC has released a two-part study of digital preservation policies: Digital Preservation Policies Study and Digital Preservation Policies Study, Part 2: Appendices—Mappings of Core University Strategies and Analysis of Their Links to Digital Preservation. The study aims to provide an outline model for digital preservation policies and to analyse the role that digital preservation can play in supporting and delivering key strategies for higher ed institutions.

cloud computing

An interesting new book -- The Tower and The Cloud: Higher Education in the Age of Cloud Computing -- has been published by Educause. The term "cloud computing" is usually used to refer to applications that run on remote systems in "the cloud" rather than on desktop computers or to the storage of files remotely rather than locally, but the book defines the term more broadly, including open-source software and social-networking tools. The full book is available online as a free PDF.

twitter war of the worlds

Thanks to Amanda for pointing this out -- I am addictively following the twitter production of War of the Worlds, an homage to the Orson Welles radio production. How this came about is described at the Ask a Wizard blog.

Tuesday, October 28, 2008

google book search settlement agreement announced

Today it was announced that Google has reached a settlement in the lawsuit filed by the Authors Guild, the Association of American Publisher, and a group of individual authors.

Some of the details are available at Google. The changes that I am the most interested in are these:

"Until now, we've only been able to show a few snippets of text for most of the in-copyright books we've scanned through our Library Project. Since the vast majority of these books are out of print, to actually read them you'd have to hunt them down at a library or a used bookstore. This agreement will allow us to make many of these out-of-print books available for preview, reading and purchase in the U.S.. Helping to ensure the ongoing accessibility of out-of-print books is one of the primary reasons we began this project in the first place, and we couldn't be happier that we and our author, library and publishing partners will now be able to protect mankind's cultural history in this manner."


"The agreement will also create an independent, not-for-profit Book Rights Registry to represent authors, publishers and other rightsholders. In essence, the Registry will help locate rightsholders and ensure that they receive the money their works earn under this agreement. You can visit the settlement administration site, the Authors Guild or the AAP to learn more about this important initiative."
I'm all for more access to these books and for rightsholders to get their due, but what does it mean to assign a value to them?

They also plan to offer subscriptions: "We'll also be offering libraries, universities and other organizations the ability to purchase institutional subscriptions, which will give users access to the complete text of millions of titles while compensating authors and publishers for the service." I have mixed feelings -- the subscription model is not an unusual one, and libraries have certainly provided digitized materials from their collections for paid subscription services before, i.e., with ProQuest. I wonder if the partners will get any share in the compensation for providing the content for the service?

I'm currently at an Open Content Alliance meeting and I'm looking forward to what I am sure will be many discussions among the attendees today.

EDIT: There's now a joint press release from the University of Michigan, the University of California, and Stanford University, a FAQ from the American Association of Publishers, a Google rightsholders site, a Google blog post, in addition to the site above and the press release.

Friday, October 24, 2008

search engine cache isn't copyright infringement

Some argue that search engines such are copyright violators because they scrawl, index and keep an archive of web sites. That copied archive -- or cache -- is, according to this argument, an unauthorized copy. Found via TechDirt, the Pennsylvania Eastern District Court held that a Web site operator's failure to deploy a robots.txt file containing instructions not to copy and cache Web site content gave rise to an implied license to index that site.

In Parker v. Yahoo!, Inc., 2008 U.S. Dist. LEXIS 74512 (E.D. Pa. Sep. 26, 2008), the court found that the plaintiff's acknowledgment that he deliberately chose not to deploy a robots.txt file on the site containing his work was conclusive on the issue of implied license. In so ruling the court followed Field v. Google, a similar copyright infringement action brought by an author who failed to deploy a robots.txt file and whose works were copied and cached by the Google search engine.

The court further ruled, though, that a nonexclusive implied license may be terminated. Parker may have terminated the implied license by the institution of the litigation, and he alleged that the search engines failed to remove copies of his works from their cache even after the litigation was instituted. If proved, "the continued use over Parker's objection might constitute direct infringement." That issue will likely be resolved at a later date.

For an analysis, see the New Media and Technology Law Blog.

The same plaintiff's earlier
Parker v. Google, Inc., No. 06-3074 (3d Cir. July 10, 2007) is also a search engine copyright infringement case.

Wednesday, October 22, 2008

tiny faces

This struck me as hilarious -- Someone noted that a box of Cascadian Farms frozen broccoli had teeny, tiny faces worked into the image on the label:

A comment in another blog said this (unsubstantiated):

"They've been putting tiny faces of employees, family and friends on the labels since at least 1995, which was when someone first showed me this on the labels of Cascadian Farms jams when I was first working at Fresh Fields (later bought by Whole Foods). CF has since been bought by General Mills, but it seems the tiny faces continue."

A bulletin board thread from 2007 claimed that the creamed corn packaging had a little hidden baby's face. Strange. That thread described it as a version of an "easter egg" in a video game or DVD ... an undocumented feature that you have to really try to find. That's pretty apt.

Sunday, October 19, 2008

los angeles food nostalgia

On a long drive recently, my partner Bruce and I were reminiscing about places we used to eat at in Los Angeles. He grew up there and has a longer list than I (maybe for another post). So many places we used to patronize as recently as the early 1990s are now gone, or, as the L.A. Time Machines site puts it, "extinct." I decided that I would try to write down the places I remember frequenting that are no longer open. Then I started semi-obsessively researching them.

  • A Chinese restaurant on Sunset Blvd or Antioch in Pacific Palisades. I don't remember the name or the food, but I do remember the old school Cantonese-American black, red, and gold dragon-decorated bar that served cocktails in tiki glasses with umbrellas. I used to order Fog Cutters or Mai Tais.
  • Blum’s in the Plaza Building at the Disneyland Hotel complex. Mom and I used to visit Disneyland every summer when we went down to L.A. to visit grandmother Johnston in Beverly Hills. Mom loved to stay in the old "Garden Rooms" section of the Disneyland Hotel (she called them the Lanai rooms). Every day we ate at least one meal at Blum's before or after a monorail trip. And on every trip we watched the "Dancing Waters" at the hotel complex.
  • Cafe Casino with locations on Gayley Ave in Westwood and on Ocean Blvd in Santa Monica. I ate a lot at the Westwood location (there was a great vintage poster store next door), and Bruce ate at the Santa Monica location.
  • Café Katsu on Sawtelle Blvd in West L.A. One of the first places Bruce and I went on a date. It was very small but the food was extraordinary. It was down the street from a Japanese restaurant that I cannot remember the name of that had fabulous grilled squid, and nearby the take-away hole-in-the-wall Tempura House that you had to visit early or they'd be out of the shrimp and sweet potatoes.
  • D.B. Levy’s sandwiches on Lindbrook Drive in Westwood. Located above what I seem to remember was a Burger King and near the now-demolished National movie theater, this place had a massive menu of sandwiches named after celebrities.
  • English Tea Room on Glendon Ave in Westwood. You walked off the street into a brick courtyard to enter this very quaint tea room. I almost always had the Welsh Rarebit and the bread and butter pudding.
  • Gianfranco restaurant and deli on Santa Monica Blvd in West L.A. This was the place a group of us ate on a very regular basis when undergraduates. I had a weakness for the gnocchi with pesto. My friend Cynthia remembers nothing but their delicate hazelnut cake.
  • Gorky’s Café at 536 East 8th Street in the downtown Garment District. Saturday mornings when I was in need of fabric or trim or beads, I'd drive to downtown L.A. to the garment district before any place was open, because the shopping day had to start with blintzes at Gorky's.
  • India's Oven on Pico Blvd in Culver City(?). The original location, with the disposable plates and cutlery.
  • Kelbo’s on Pico Blvd in west L.A. Polynesian tiki tacky like you cannot believe. Friends went there for the cheap, huge communal well drinks with lots of long straws. I went for the tiki tacky. They also sold painted faux stained glass for some reason.
  • Knoll's Black Forest Inn on Wilshire Blvd in Santa Monica. An unchanging decor and menu for decades.
  • Merlin McFly's on Main St. in Santa Monica. I remember it being near a great vintage clothing store. The draw was the amazing stained glass windows that featured historic magicians. The restaurant is long gone, but the windows were saved and are now at a venue called Magicopolis.
  • Mie and Mie Falafal in Westwood. I never had falafal before my freshman year of college. Our dorm was being renovated for the 1984 Olympics and had no dining hall, so I often found myself there.
  • Moise’s Mexican on Santa Monica Blvd near Federal in West L.A. I lived four blocks away and always ordered the same thing -- carne en serape, a burrito filled with beef in a sour cream sauce, covered in cheese and quickly broiled.
  • Panda Inn on Pico Blvd in West L.A. An elegant 80s room, the 2nd or 3rd location after Pasadena, before it went national. It was at the Westside Pavilion mall, an upscale mall designed by Jon Jerde.
  • The Penguin Coffee Shop at 1670 Lincoln Blvd in Santa Monica. My friend Cynthia and I loved it for its Penguin logo. The sign is partly still there (as is the Googie-style building), even though it became an orthodontic office.
  • Polly's Pies at 501 Wilshire Blvd in Santa Monica. Bruce loved this place.
  • R.J.s for Ribs at 252 N. Beverly Blvd in Beverly Hills. The ribs were good, but the real joy was trying to stump your waiter when he asked what animal he should fashion your foil package for leftover food into. Armadillos, bats, ...
  • Robata on Santa Monica Blvd in West L.A. near the Nuart theater. I was a very frequent attendee when the Nuart (and the Fox on Lincoln in Venice) were full-time revival houses. Robata was, as you'd expect, Japanese robata, or grilled, skewered food.
  • The Sculpture Gardens restaurant on Abbott Kinney in Venice. Bruce and I ate brunch there very frequently, but no one else seems to remember it, with its multiple little buildings surrounding a funky courtyard with sculptures. They had the best breakfast bread basket and baked apple pancakes.
  • Ship’s coffee shop at 10877 Wilshire Blvd in Westwood. A classic Googie-style coffee shop with toasters at every table. My friend Kevin ate there a lot.
  • Tampico Tilly's on Wilshire in Santa Monica. Cheap, decent Mexican in a huge faux rancho house. I think El Cholo took over the building.
  • Trader Vic’s at 9876 Wilshire Blvd in Beverly Hills. I remember eating there every summer with my grandmother Johnston. I was always allowed to order a Shirley Temple, which feels pretty daring when you're 10 years old. Sometimes we also ate at The Velvet Turtle on Sepulveda Blvd.
  • Wildflour pizza on Wilshire Blvd in Santa Monica. There is still one location open, but this is the one I remember best. They had to-die for spinach salad with marinated artichoke hearts.
  • Zucky’s Deli at 431 Wilshire Blvd at 5th St in Santa Monica. I had a roommate in college who was from New York via Florida. He took me to Zucky's for my first egg cream, and introduced me to Fox's Ubet syrup. When I worked at the Getty Research Institute when it was at 4th and Wilshire, I often stopped in for the fabulous corn muffins from their bakery. Izzy's was across the street -- they had good pie but I remember really disliking their tuna melts. What an odd thing to remember.
Of course there were lots of other places that are still there ... Angeli Caffe on Melrose Ave in West Hollywood, Anna Maria's Trattoria on Wilshire in Santa Monica, The Apple Pan on Pico, the Border Grill on 4th Street in Santa Monica, the Broadway Deli on the 3rd Street Promenade in Santa Monica, Campanile on La Brea Ave in North Hollywood, Chaya Venice on Navy Street in Venice, Chin Chin on San Vicente Blvd in Brentwood, i Cugini on Ocean Avenue in Santa Monica, Dhaba Indian on Main in Santa Monica, Empress Pavilion on Hill St in Chinatown, Father's Office on Montana in Santa Monica, Marix Playa on Entrada in Pacific Palisades, Noma Sushi on Wilshire Blvd in Santa Monica, Stan's Donuts on Weyburn in Westwood (the best apple fritters), Robin Rose ice cream on Rose Ave. in Venice, The Rose Cafe on Rose Ave. in Venice, Snug Harbor on Wilshire in Santa Monica, Thai Dishes on Wilshire in West L.A., Versailles Cuban on Venice Blvd in Culver City, Woo Lae Oak Korean on Western at Wilshire (the best place to eat before shows at the Wiltern theater), Ye Olde King's Head on Santa Monica Blvd in Santa Monica (why do I think it was somewhere else before?) ... and likely dozens that I don't remember right now.

Friday, October 17, 2008

digital book access at John Hopkins

Jonathan Rochkind has posted a great description of digital book access features that he's put into production in the link resolver and OPAC at Johns Hopkins. They're remarkable in the sense that he's taken advantage of so many different service APIs (Google Books, IA, OCLC, Amazon, HathiTrust) to provide functionality with conditional options to provide as much collection coverage as possible.

obstacles to universal access

I've just read an interesting paper from a presentation at the recent CIDOC meeting: Nicholas Crofts, “Digital Assets and Digital Burdens: Obstacles to the Dream of Universal Access,” 2008 Annual Conference of CIDOC (Athens, September 15-18, 2008).

The premise is that technology is not the issue keeping our institutions from reaching a goal of universal access -- it's a number of post-technical issues, including varied intellectual property barriers, institutions' desires to protect their digital assets, and collection documentation that is not well-suited to sharing.

From the section on "Suitability of Documentation":

... but while this technical revolution has taken place, there has not been a corresponding revolution in documentation practice. The way that documentation is prepared and maintained and the sort of documentation that is produced are still heavily influenced by pre-Internet assumptions. The documentation found in museums – the raw material for diffusion – is often ill-suited for publication.
From the conclusion:
While making cultural material freely available is part of their mission, and therefore a goal that they are obliged to support, it may still come into conflict with other factors, notably commercial interests: the need to maintain a high-profile and to protect an effective brand image. If museums are to cooperate successfully and make digital resources widely available on collaborative platforms, they will either need to find ways of avoiding institutional anonymity, or agree to put aside their institutional identity to one side.
It's a frank and interesting paper. I think there has been progress in documentation practice -- look at the CCO and the Aquifer Shareable Metadata efforts, and the earlier Categories for the Description of Works of Art -- but it's true that this hasn't yet taken hold in a widespread way.

Wednesday, October 15, 2008

First Monday article on Google Books and OCA

The newest issue of First Monday (volume 13, number 10, 6 October 2008) has an interesting article by KalevLeetaru -- "Mass book digitization: The deeper story of Google Books and the Open Content Alliance."
The article compares what is publicly known about the Google Book and OCA projects.

From the conclusions:

While on their surface, the Google Books and Open Content Alliance projects may appear very different, they in fact share many similarities:

  • Both operate as a black box outsourcing agent. The participating library transports books to the facility to be scanned and fetches them when they are done. The library provides or assists with housing for the facility, but its personnel are not permitted to operate the scanning units, which must be staffed by personnel from either Google or OCA.

  • Neither publishes official technical reports. Google engineers have published in the literature on specific components of their project, which offer crucial insights into the processes they use, while talks from senior leadership have yielded additional information. OCA has largely been absent from the literature and few speeches have unveiled substantial technical details. Both projects have chosen not to issue exhaustive technical reports outlining their infrastructure: Google due to trade secret concerns and OCA due to a lack of available time.

  • Both digitize in–copyright works. Google Books scans both out–of–copyright books and those for which copyright protection is still in force. OCA scans out–of–copyright books and only scans in–copyright books when permission has been secured to do so. Both initiatives maintain partnerships with publishers to acquire substantial in–copyright digital content.

  • Both use manual page turning and digital camera capture. Large teams of humans are used to manually turn pages in front of a pair of digital cameras that snap color photographs of the pages.

  • Both permit libraries to redistribute materials digitized from their collections. While redistribution rights vary for other entities, both the Google Books and OCA initiatives permit the library providing a work for digitization to host its own copy of that digitized work for selected personal use distribution.

  • Both permit unlimited personal use of out–of–copyright works. While redistribution rights vary for other entities, both the Google Books and OCA initiatives permit the library providing a work for digitization to host its own copy of that digitized work for selected personal use distribution.

  • Both enforce some restrictions on redistribution or commercial use. Google Books enforces a blanket prohibition on the commercial use of its materials, while at least one of OCA’s scanning partners does the same. Google requires users to contact it about redistribution or bulk downloading requests, while OCA permits any of its member institutions to restrict the redistribution of their material.

From the section on "Transparency"
A common comparison of the Google Books and Open Content Alliance projects revolves around the shroud of secrecy that underlies the Google Books operation. However, one may argue that such secrecy does not necessarily diminish the usefulness of access digitization projects, since the underlying technology and processes do not matter, only the final result. This is in contrast to preservation scanning, in which it may be argued that transparency is an essential attribute, since it is important to understand the technologies being used so as to understand the faithfulness of the resulting product. When it comes down to it, does it necessarily matter what particular piece of software or algorithm was used to perform bitonal thresholding on a page scan? When the intent of a project is simply to generate useable digital surrogates of printed works, the project may be considered a success if the files it offers provide digital access to those materials.
To me, that paragraph gets at the key issue in discussing and comparing the projects -- are books being scanned in a consistent way and being made accessible through at least one portal, enforcing current rights restrictions? Yes? Then both these projects are, at a basic level, successful and provide a useful service.

Yes, there are issues to quibble with for both projects. More technical transparency is desirable for both projects. Both have controlled workflows that limit what can be contributed to the projects in different ways. There are aspects of the Google workflow that Google contractually requires its partners to keep secret. That's their right to include in their contracts, and a potential partner's decision to make if they find it objectionable and therefore choose not to participate. Each documents and enforces rights in different ways and to different extents -- we should be looking to standards in that area. Each sets different requirements for allowing reuse. If only there could be agreement.

One note on preservation. Neither projects are preservation projects -- they're access projects. Even if there were something we could point to and say "that's a preservation-quality digital surrogate" -- if such a concept as "preservation-quality" exists -- neither project aims for that. Both projects do, however, allow the participating libraries to preserve the files created through the projects. These files should and must be preserved because they can be used to provide digital modes of access, and, in some cases, they may be the only surrogates ever made if the condition of a book has deteriorated. Look at the HathiTrust for more on the topic of preserving the output of mass digitization projects.

And one note about the Google project providing "free" digitization for its participants. Yes, Google is underwriting the cost of digitization. But each partner library is bearing the cost of staffing and supplies for project management, checkout/checkin, shelving, barcoding, cataloging, and conservation activities, not to mention storage and management of the files. The overall cost is definitely reduced, but not free.

Tuesday, October 14, 2008

Frankfurt Book Fair survey on digitization

Via TeleRead, the 2008 Frankfurt Book Fair conducted a survey on how digitization will shape the future of publishing. The summary results are available in a press release.

These are the top four challenges facing the industry identified through the survey:

• copyright – 28 per cent
• digital rights management – 22 per cent
• standard format (such as epub) – 21 per cent
• retail price maintenance – 16 per cent

Not knowing what the details of these concerns really are in their survey results, as generalizations the first three are an interesting overlap with challenges facing digital collection building in libraries. What are appropriate terms for copyright and licensing for libraries? How do we identify/document copyright (and other rights) status? How do we manage access and provide for fair use with varying DRM scenarios? What standards will enhance preservation and ongoing access?

Wednesday, October 08, 2008

DCC Curation Lifecycle Model

Via the Digital Curation Blog, I came across the DCC Curation Lifecycle Model. This is a very interesting high-level overview of the life cycle stages in digital curation efforts. There's an introductory article available.

The model proposes a generic set of sequential activities -- creating or receiving content, appraisal, ingest, preservation events, storage, etc. There are some decisions points at the appraisal and preservation event stages about next steps -- refusal, reappraisal, migration, etc. A colleague and I sat together and looked it over this afternoon. We were both looking at it from a perspective of a digital collections repository and not an IR, and the model was designed primarily with IRs in mind, so our thoughts are coming from a different place in terms of what we wanted to see additionally taken into account in the visualization.

There's a "transform" activity -- definitely something that takes place potentially multiple times in a data life cycle. In the visualization this appears sequentially after "store" and "access, use and reuse." This is an activity that's hard to include in a visualization of a sequence because it can take place at so many points, but it feels like it should be earlier in the sequence, perhaps before those two steps.

The next ring is labeled with the activities "curate" and "preserve" with arrows. Does the placement of the terms and arrows mean anything in relation to the outermost ring? Are "ingest," "preservation activity" and "store" part of "preserve" and the rest part of "curate?" Or does this more simply represent ongoing activities?

The center of the model is the data. It's surrounded by a ring for descriptive and presentation information. It's an activity of central importance and is directly related to the data as is shown, but we weren't sure how its placement related to the sequence of tasks in the visualization.

"Preservation planning" is the next ring out. Planning and implementation are a central, ongoing activity. We also weren't sure when this ongoing activity meshed with the sequence.

"Community watch and participation" is the last remaining inner ring. It's also on ongoing activity. What actions might the outcomes of this activity affect?

Overall, this is a good model for planning. It's challenging to create a visualization for complex processes and dependencies and this covers a lot of ground. And of course it's meant to be generic and high-level, to be made more concrete by an institution that makes use of it. It certainly stimulated our thinking in terms of how we might model our data life cycle and the dependencies between the various tasks.

NOTE: Sarah Higgins, who created the model, has provided excellent responses to my thoughts and questions in the comments to this post. Please read them!

Sunday, October 05, 2008

Gettysburg Cyclorama

A cyclorama was the cutting-edge multimedia installation of its time in the 1870-90s. A massive 360 degree painting in the round, it was often accompanied by narration, music, and a light show to heighten the illusion. Today I went to see the conserved, restored, and reinstalled Gettysburg Cyclorama at the new visitors' center. The center opened in April, but the Cyclorama only reopened 10 days ago.

I'm glad I went. True, you only get to spend 15 minutes in the Cyclorama gallery and you have to sit through a short movie about the battle first because the museum, movie, and cyclorama are on one ticket. The new museum is very nicely designed and installed (and extensive), the movie is not too long and very well-done, and the tickets are reasonably priced.

The painting (by Paul Philippoteaux, 1884) was installed in the new facility with its diorama foreground illusions recreated. They run a 15-minute narrated sound and light show to recreate Pickett's Charge (dawn over the battlefield is amazing), then they bring up the lights for a few minutes so you can see the entire painting clearly. In some spots the diorama leads seamlessly into the painting. It's still an amazing illusion and it takes your breath away.

The Cyclorama painting was previously housed in a Richard Neutra-designed building at Getttysburg. The Neutra building is scheduled for demolition in December 2008, but there is litigation to attempt to stop it. That will be a difficult case -- battlefield restoration versus Modern architecture preservation.

Friday, October 03, 2008

Federal Agencies Digitization Guidelines Initiative

The Federal Agencies Digitization Guidelines Initiative site went live on September 30, 2008. The initiative represents a collaborative effort between U.S. government agencies to establish a common set of guidelines for digitizing historical materials. Participants include the Defense Visual Information Directorate, the Library of Congress, the National Agricultural Library, the National Archives and Records Administration, the National Gallery of Art, the National Library of Medicine, the National Technical Information Service, the National Transportation Library, the Smithsonian Institution, the U.S. Geological Survey, the U.S. Government Printing Office, and The Voice of America.

The Still Image Working Group is focusing its efforts on books, manuscripts, maps, and photographic prints and negatives. There are draft "Digital Imaging Framework" and "TIFF Image Metadata" documents available. The Audio-Visual Working Group effort will cover sound and video recordings and will consider the inclusion of motion picture film as the project proceeds. That group is still at the document drafting stage.

Thursday, October 02, 2008

interesting re-use of American Memory content

From Boing Boing:

American Memory is a new and compelling DVD coming from extended Skinny Puppy posse members William Morrison and Justin Bennett later this year. It took me a while to figure out exactly what was going on (and exactly who was responsible), but that didn't detract from this hypnotic and ultimately forceful piece.

The voice in the clip on the DVD's trailer is that of former slave Alice Gaston, interviewed in her eighties for the Library of Congress in 1941. The actress is lip-synching to her dialogue. Videomaker William Morrison explains that the whole project works this way, using audio from the American Memory Archive along with new and processed footage. And, of course, Skinny Puppy music.

According to Morrison: "The theoretical context of the project is that some time in the very distance future, long after America is gone, some artists scouring the backwater of whatever the net has become discover the American Memory Archive. They have no context for it's meaning but are intrigued by the sights and sounds. They create surreal impressions of the material they find and broadcast it back through time. A quantum radio channel beamed into the sub conscious minds of the 21st century."

A few different permutations of the band will be playing a show on December 4 at the Gramercy in NYC.