Friday, October 24, 2008

search engine cache isn't copyright infringement

Some argue that search engines such are copyright violators because they scrawl, index and keep an archive of web sites. That copied archive -- or cache -- is, according to this argument, an unauthorized copy. Found via TechDirt, the Pennsylvania Eastern District Court held that a Web site operator's failure to deploy a robots.txt file containing instructions not to copy and cache Web site content gave rise to an implied license to index that site.

In Parker v. Yahoo!, Inc., 2008 U.S. Dist. LEXIS 74512 (E.D. Pa. Sep. 26, 2008), the court found that the plaintiff's acknowledgment that he deliberately chose not to deploy a robots.txt file on the site containing his work was conclusive on the issue of implied license. In so ruling the court followed Field v. Google, a similar copyright infringement action brought by an author who failed to deploy a robots.txt file and whose works were copied and cached by the Google search engine.

The court further ruled, though, that a nonexclusive implied license may be terminated. Parker may have terminated the implied license by the institution of the litigation, and he alleged that the search engines failed to remove copies of his works from their cache even after the litigation was instituted. If proved, "the continued use over Parker's objection might constitute direct infringement." That issue will likely be resolved at a later date.

For an analysis, see the New Media and Technology Law Blog.

The same plaintiff's earlier
Parker v. Google, Inc., No. 06-3074 (3d Cir. July 10, 2007) is also a search engine copyright infringement case.

Wednesday, October 22, 2008

tiny faces

This struck me as hilarious -- Someone noted that a box of Cascadian Farms frozen broccoli had teeny, tiny faces worked into the image on the label:

http://bread-and-honey.blogspot.com/2008/10/wtf-broccoli.html

A comment in another blog said this (unsubstantiated):

"They've been putting tiny faces of employees, family and friends on the labels since at least 1995, which was when someone first showed me this on the labels of Cascadian Farms jams when I was first working at Fresh Fields (later bought by Whole Foods). CF has since been bought by General Mills, but it seems the tiny faces continue."

A bulletin board thread from 2007 claimed that the creamed corn packaging had a little hidden baby's face. Strange. That thread described it as a version of an "easter egg" in a video game or DVD ... an undocumented feature that you have to really try to find. That's pretty apt.

Sunday, October 19, 2008

los angeles food nostalgia

On a long drive recently, my partner Bruce and I were reminiscing about places we used to eat at in Los Angeles. He grew up there and has a longer list than I (maybe for another post). So many places we used to patronize as recently as the early 1990s are now gone, or, as the L.A. Time Machines site puts it, "extinct." I decided that I would try to write down the places I remember frequenting that are no longer open. Then I started semi-obsessively researching them.

  • A Chinese restaurant on Sunset Blvd or Antioch in Pacific Palisades. I don't remember the name or the food, but I do remember the old school Cantonese-American black, red, and gold dragon-decorated bar that served cocktails in tiki glasses with umbrellas. I used to order Fog Cutters or Mai Tais.
  • Blum’s in the Plaza Building at the Disneyland Hotel complex. Mom and I used to visit Disneyland every summer when we went down to L.A. to visit grandmother Johnston in Beverly Hills. Mom loved to stay in the old "Garden Rooms" section of the Disneyland Hotel (she called them the Lanai rooms). Every day we ate at least one meal at Blum's before or after a monorail trip. And on every trip we watched the "Dancing Waters" at the hotel complex.
  • Cafe Casino with locations on Gayley Ave in Westwood and on Ocean Blvd in Santa Monica. I ate a lot at the Westwood location (there was a great vintage poster store next door), and Bruce ate at the Santa Monica location.
  • Café Katsu on Sawtelle Blvd in West L.A. One of the first places Bruce and I went on a date. It was very small but the food was extraordinary. It was down the street from a Japanese restaurant that I cannot remember the name of that had fabulous grilled squid, and nearby the take-away hole-in-the-wall Tempura House that you had to visit early or they'd be out of the shrimp and sweet potatoes.
  • D.B. Levy’s sandwiches on Lindbrook Drive in Westwood. Located above what I seem to remember was a Burger King and near the now-demolished National movie theater, this place had a massive menu of sandwiches named after celebrities.
  • English Tea Room on Glendon Ave in Westwood. You walked off the street into a brick courtyard to enter this very quaint tea room. I almost always had the Welsh Rarebit and the bread and butter pudding.
  • Gianfranco restaurant and deli on Santa Monica Blvd in West L.A. This was the place a group of us ate on a very regular basis when undergraduates. I had a weakness for the gnocchi with pesto. My friend Cynthia remembers nothing but their delicate hazelnut cake.
  • Gorky’s Café at 536 East 8th Street in the downtown Garment District. Saturday mornings when I was in need of fabric or trim or beads, I'd drive to downtown L.A. to the garment district before any place was open, because the shopping day had to start with blintzes at Gorky's.
  • India's Oven on Pico Blvd in Culver City(?). The original location, with the disposable plates and cutlery.
  • Kelbo’s on Pico Blvd in west L.A. Polynesian tiki tacky like you cannot believe. Friends went there for the cheap, huge communal well drinks with lots of long straws. I went for the tiki tacky. They also sold painted faux stained glass for some reason.
  • Knoll's Black Forest Inn on Wilshire Blvd in Santa Monica. An unchanging decor and menu for decades.
  • Merlin McFly's on Main St. in Santa Monica. I remember it being near a great vintage clothing store. The draw was the amazing stained glass windows that featured historic magicians. The restaurant is long gone, but the windows were saved and are now at a venue called Magicopolis.
  • Mie and Mie Falafal in Westwood. I never had falafal before my freshman year of college. Our dorm was being renovated for the 1984 Olympics and had no dining hall, so I often found myself there.
  • Moise’s Mexican on Santa Monica Blvd near Federal in West L.A. I lived four blocks away and always ordered the same thing -- carne en serape, a burrito filled with beef in a sour cream sauce, covered in cheese and quickly broiled.
  • Panda Inn on Pico Blvd in West L.A. An elegant 80s room, the 2nd or 3rd location after Pasadena, before it went national. It was at the Westside Pavilion mall, an upscale mall designed by Jon Jerde.
  • The Penguin Coffee Shop at 1670 Lincoln Blvd in Santa Monica. My friend Cynthia and I loved it for its Penguin logo. The sign is partly still there (as is the Googie-style building), even though it became an orthodontic office.
  • Polly's Pies at 501 Wilshire Blvd in Santa Monica. Bruce loved this place.
  • R.J.s for Ribs at 252 N. Beverly Blvd in Beverly Hills. The ribs were good, but the real joy was trying to stump your waiter when he asked what animal he should fashion your foil package for leftover food into. Armadillos, bats, ...
  • Robata on Santa Monica Blvd in West L.A. near the Nuart theater. I was a very frequent attendee when the Nuart (and the Fox on Lincoln in Venice) were full-time revival houses. Robata was, as you'd expect, Japanese robata, or grilled, skewered food.
  • The Sculpture Gardens restaurant on Abbott Kinney in Venice. Bruce and I ate brunch there very frequently, but no one else seems to remember it, with its multiple little buildings surrounding a funky courtyard with sculptures. They had the best breakfast bread basket and baked apple pancakes.
  • Ship’s coffee shop at 10877 Wilshire Blvd in Westwood. A classic Googie-style coffee shop with toasters at every table. My friend Kevin ate there a lot.
  • Tampico Tilly's on Wilshire in Santa Monica. Cheap, decent Mexican in a huge faux rancho house. I think El Cholo took over the building.
  • Trader Vic’s at 9876 Wilshire Blvd in Beverly Hills. I remember eating there every summer with my grandmother Johnston. I was always allowed to order a Shirley Temple, which feels pretty daring when you're 10 years old. Sometimes we also ate at The Velvet Turtle on Sepulveda Blvd.
  • Wildflour pizza on Wilshire Blvd in Santa Monica. There is still one location open, but this is the one I remember best. They had to-die for spinach salad with marinated artichoke hearts.
  • Zucky’s Deli at 431 Wilshire Blvd at 5th St in Santa Monica. I had a roommate in college who was from New York via Florida. He took me to Zucky's for my first egg cream, and introduced me to Fox's Ubet syrup. When I worked at the Getty Research Institute when it was at 4th and Wilshire, I often stopped in for the fabulous corn muffins from their bakery. Izzy's was across the street -- they had good pie but I remember really disliking their tuna melts. What an odd thing to remember.
Of course there were lots of other places that are still there ... Angeli Caffe on Melrose Ave in West Hollywood, Anna Maria's Trattoria on Wilshire in Santa Monica, The Apple Pan on Pico, the Border Grill on 4th Street in Santa Monica, the Broadway Deli on the 3rd Street Promenade in Santa Monica, Campanile on La Brea Ave in North Hollywood, Chaya Venice on Navy Street in Venice, Chin Chin on San Vicente Blvd in Brentwood, i Cugini on Ocean Avenue in Santa Monica, Dhaba Indian on Main in Santa Monica, Empress Pavilion on Hill St in Chinatown, Father's Office on Montana in Santa Monica, Marix Playa on Entrada in Pacific Palisades, Noma Sushi on Wilshire Blvd in Santa Monica, Stan's Donuts on Weyburn in Westwood (the best apple fritters), Robin Rose ice cream on Rose Ave. in Venice, The Rose Cafe on Rose Ave. in Venice, Snug Harbor on Wilshire in Santa Monica, Thai Dishes on Wilshire in West L.A., Versailles Cuban on Venice Blvd in Culver City, Woo Lae Oak Korean on Western at Wilshire (the best place to eat before shows at the Wiltern theater), Ye Olde King's Head on Santa Monica Blvd in Santa Monica (why do I think it was somewhere else before?) ... and likely dozens that I don't remember right now.

Friday, October 17, 2008

digital book access at John Hopkins

Jonathan Rochkind has posted a great description of digital book access features that he's put into production in the link resolver and OPAC at Johns Hopkins. They're remarkable in the sense that he's taken advantage of so many different service APIs (Google Books, IA, OCLC, Amazon, HathiTrust) to provide functionality with conditional options to provide as much collection coverage as possible.

obstacles to universal access

I've just read an interesting paper from a presentation at the recent CIDOC meeting: Nicholas Crofts, “Digital Assets and Digital Burdens: Obstacles to the Dream of Universal Access,” 2008 Annual Conference of CIDOC (Athens, September 15-18, 2008).

The premise is that technology is not the issue keeping our institutions from reaching a goal of universal access -- it's a number of post-technical issues, including varied intellectual property barriers, institutions' desires to protect their digital assets, and collection documentation that is not well-suited to sharing.

From the section on "Suitability of Documentation":

... but while this technical revolution has taken place, there has not been a corresponding revolution in documentation practice. The way that documentation is prepared and maintained and the sort of documentation that is produced are still heavily influenced by pre-Internet assumptions. The documentation found in museums – the raw material for diffusion – is often ill-suited for publication.
From the conclusion:
While making cultural material freely available is part of their mission, and therefore a goal that they are obliged to support, it may still come into conflict with other factors, notably commercial interests: the need to maintain a high-profile and to protect an effective brand image. If museums are to cooperate successfully and make digital resources widely available on collaborative platforms, they will either need to find ways of avoiding institutional anonymity, or agree to put aside their institutional identity to one side.
It's a frank and interesting paper. I think there has been progress in documentation practice -- look at the CCO and the Aquifer Shareable Metadata efforts, and the earlier Categories for the Description of Works of Art -- but it's true that this hasn't yet taken hold in a widespread way.

Wednesday, October 15, 2008

First Monday article on Google Books and OCA

The newest issue of First Monday (volume 13, number 10, 6 October 2008) has an interesting article by KalevLeetaru -- "Mass book digitization: The deeper story of Google Books and the Open Content Alliance."
The article compares what is publicly known about the Google Book and OCA projects.

From the conclusions:

While on their surface, the Google Books and Open Content Alliance projects may appear very different, they in fact share many similarities:

  • Both operate as a black box outsourcing agent. The participating library transports books to the facility to be scanned and fetches them when they are done. The library provides or assists with housing for the facility, but its personnel are not permitted to operate the scanning units, which must be staffed by personnel from either Google or OCA.

  • Neither publishes official technical reports. Google engineers have published in the literature on specific components of their project, which offer crucial insights into the processes they use, while talks from senior leadership have yielded additional information. OCA has largely been absent from the literature and few speeches have unveiled substantial technical details. Both projects have chosen not to issue exhaustive technical reports outlining their infrastructure: Google due to trade secret concerns and OCA due to a lack of available time.

  • Both digitize in–copyright works. Google Books scans both out–of–copyright books and those for which copyright protection is still in force. OCA scans out–of–copyright books and only scans in–copyright books when permission has been secured to do so. Both initiatives maintain partnerships with publishers to acquire substantial in–copyright digital content.

  • Both use manual page turning and digital camera capture. Large teams of humans are used to manually turn pages in front of a pair of digital cameras that snap color photographs of the pages.

  • Both permit libraries to redistribute materials digitized from their collections. While redistribution rights vary for other entities, both the Google Books and OCA initiatives permit the library providing a work for digitization to host its own copy of that digitized work for selected personal use distribution.

  • Both permit unlimited personal use of out–of–copyright works. While redistribution rights vary for other entities, both the Google Books and OCA initiatives permit the library providing a work for digitization to host its own copy of that digitized work for selected personal use distribution.

  • Both enforce some restrictions on redistribution or commercial use. Google Books enforces a blanket prohibition on the commercial use of its materials, while at least one of OCA’s scanning partners does the same. Google requires users to contact it about redistribution or bulk downloading requests, while OCA permits any of its member institutions to restrict the redistribution of their material.

From the section on "Transparency"
A common comparison of the Google Books and Open Content Alliance projects revolves around the shroud of secrecy that underlies the Google Books operation. However, one may argue that such secrecy does not necessarily diminish the usefulness of access digitization projects, since the underlying technology and processes do not matter, only the final result. This is in contrast to preservation scanning, in which it may be argued that transparency is an essential attribute, since it is important to understand the technologies being used so as to understand the faithfulness of the resulting product. When it comes down to it, does it necessarily matter what particular piece of software or algorithm was used to perform bitonal thresholding on a page scan? When the intent of a project is simply to generate useable digital surrogates of printed works, the project may be considered a success if the files it offers provide digital access to those materials.
To me, that paragraph gets at the key issue in discussing and comparing the projects -- are books being scanned in a consistent way and being made accessible through at least one portal, enforcing current rights restrictions? Yes? Then both these projects are, at a basic level, successful and provide a useful service.

Yes, there are issues to quibble with for both projects. More technical transparency is desirable for both projects. Both have controlled workflows that limit what can be contributed to the projects in different ways. There are aspects of the Google workflow that Google contractually requires its partners to keep secret. That's their right to include in their contracts, and a potential partner's decision to make if they find it objectionable and therefore choose not to participate. Each documents and enforces rights in different ways and to different extents -- we should be looking to standards in that area. Each sets different requirements for allowing reuse. If only there could be agreement.

One note on preservation. Neither projects are preservation projects -- they're access projects. Even if there were something we could point to and say "that's a preservation-quality digital surrogate" -- if such a concept as "preservation-quality" exists -- neither project aims for that. Both projects do, however, allow the participating libraries to preserve the files created through the projects. These files should and must be preserved because they can be used to provide digital modes of access, and, in some cases, they may be the only surrogates ever made if the condition of a book has deteriorated. Look at the HathiTrust for more on the topic of preserving the output of mass digitization projects.

And one note about the Google project providing "free" digitization for its participants. Yes, Google is underwriting the cost of digitization. But each partner library is bearing the cost of staffing and supplies for project management, checkout/checkin, shelving, barcoding, cataloging, and conservation activities, not to mention storage and management of the files. The overall cost is definitely reduced, but not free.

Tuesday, October 14, 2008

Frankfurt Book Fair survey on digitization

Via TeleRead, the 2008 Frankfurt Book Fair conducted a survey on how digitization will shape the future of publishing. The summary results are available in a press release.

These are the top four challenges facing the industry identified through the survey:

• copyright – 28 per cent
• digital rights management – 22 per cent
• standard format (such as epub) – 21 per cent
• retail price maintenance – 16 per cent

Not knowing what the details of these concerns really are in their survey results, as generalizations the first three are an interesting overlap with challenges facing digital collection building in libraries. What are appropriate terms for copyright and licensing for libraries? How do we identify/document copyright (and other rights) status? How do we manage access and provide for fair use with varying DRM scenarios? What standards will enhance preservation and ongoing access?

Wednesday, October 08, 2008

DCC Curation Lifecycle Model

Via the Digital Curation Blog, I came across the DCC Curation Lifecycle Model. This is a very interesting high-level overview of the life cycle stages in digital curation efforts. There's an introductory article available.

The model proposes a generic set of sequential activities -- creating or receiving content, appraisal, ingest, preservation events, storage, etc. There are some decisions points at the appraisal and preservation event stages about next steps -- refusal, reappraisal, migration, etc. A colleague and I sat together and looked it over this afternoon. We were both looking at it from a perspective of a digital collections repository and not an IR, and the model was designed primarily with IRs in mind, so our thoughts are coming from a different place in terms of what we wanted to see additionally taken into account in the visualization.

There's a "transform" activity -- definitely something that takes place potentially multiple times in a data life cycle. In the visualization this appears sequentially after "store" and "access, use and reuse." This is an activity that's hard to include in a visualization of a sequence because it can take place at so many points, but it feels like it should be earlier in the sequence, perhaps before those two steps.

The next ring is labeled with the activities "curate" and "preserve" with arrows. Does the placement of the terms and arrows mean anything in relation to the outermost ring? Are "ingest," "preservation activity" and "store" part of "preserve" and the rest part of "curate?" Or does this more simply represent ongoing activities?

The center of the model is the data. It's surrounded by a ring for descriptive and presentation information. It's an activity of central importance and is directly related to the data as is shown, but we weren't sure how its placement related to the sequence of tasks in the visualization.

"Preservation planning" is the next ring out. Planning and implementation are a central, ongoing activity. We also weren't sure when this ongoing activity meshed with the sequence.

"Community watch and participation" is the last remaining inner ring. It's also on ongoing activity. What actions might the outcomes of this activity affect?

Overall, this is a good model for planning. It's challenging to create a visualization for complex processes and dependencies and this covers a lot of ground. And of course it's meant to be generic and high-level, to be made more concrete by an institution that makes use of it. It certainly stimulated our thinking in terms of how we might model our data life cycle and the dependencies between the various tasks.

NOTE: Sarah Higgins, who created the model, has provided excellent responses to my thoughts and questions in the comments to this post. Please read them!

Sunday, October 05, 2008

Gettysburg Cyclorama

A cyclorama was the cutting-edge multimedia installation of its time in the 1870-90s. A massive 360 degree painting in the round, it was often accompanied by narration, music, and a light show to heighten the illusion. Today I went to see the conserved, restored, and reinstalled Gettysburg Cyclorama at the new visitors' center. The center opened in April, but the Cyclorama only reopened 10 days ago.

I'm glad I went. True, you only get to spend 15 minutes in the Cyclorama gallery and you have to sit through a short movie about the battle first because the museum, movie, and cyclorama are on one ticket. The new museum is very nicely designed and installed (and extensive), the movie is not too long and very well-done, and the tickets are reasonably priced.

The painting (by Paul Philippoteaux, 1884) was installed in the new facility with its diorama foreground illusions recreated. They run a 15-minute narrated sound and light show to recreate Pickett's Charge (dawn over the battlefield is amazing), then they bring up the lights for a few minutes so you can see the entire painting clearly. In some spots the diorama leads seamlessly into the painting. It's still an amazing illusion and it takes your breath away.

The Cyclorama painting was previously housed in a Richard Neutra-designed building at Getttysburg. The Neutra building is scheduled for demolition in December 2008, but there is litigation to attempt to stop it. That will be a difficult case -- battlefield restoration versus Modern architecture preservation.

Friday, October 03, 2008

Federal Agencies Digitization Guidelines Initiative

The Federal Agencies Digitization Guidelines Initiative site went live on September 30, 2008. The initiative represents a collaborative effort between U.S. government agencies to establish a common set of guidelines for digitizing historical materials. Participants include the Defense Visual Information Directorate, the Library of Congress, the National Agricultural Library, the National Archives and Records Administration, the National Gallery of Art, the National Library of Medicine, the National Technical Information Service, the National Transportation Library, the Smithsonian Institution, the U.S. Geological Survey, the U.S. Government Printing Office, and The Voice of America.

The Still Image Working Group is focusing its efforts on books, manuscripts, maps, and photographic prints and negatives. There are draft "Digital Imaging Framework" and "TIFF Image Metadata" documents available. The Audio-Visual Working Group effort will cover sound and video recordings and will consider the inclusion of motion picture film as the project proceeds. That group is still at the document drafting stage.

Thursday, October 02, 2008

interesting re-use of American Memory content

From Boing Boing:

American Memory is a new and compelling DVD coming from extended Skinny Puppy posse members William Morrison and Justin Bennett later this year. It took me a while to figure out exactly what was going on (and exactly who was responsible), but that didn't detract from this hypnotic and ultimately forceful piece.

The voice in the clip on the DVD's trailer is that of former slave Alice Gaston, interviewed in her eighties for the Library of Congress in 1941. The actress is lip-synching to her dialogue. Videomaker William Morrison explains that the whole project works this way, using audio from the American Memory Archive along with new and processed footage. And, of course, Skinny Puppy music.

According to Morrison: "The theoretical context of the project is that some time in the very distance future, long after America is gone, some artists scouring the backwater of whatever the net has become discover the American Memory Archive. They have no context for it's meaning but are intrigued by the sights and sounds. They create surreal impressions of the material they find and broadcast it back through time. A quantum radio channel beamed into the sub conscious minds of the 21st century."

A few different permutations of the band will be playing a show on December 4 at the Gramercy in NYC.

Wednesday, September 24, 2008

new version of getty introduction to metadata

The third edition of the Getty Introduction to Metadata -- edited by Murtha Baca, with essays by Tony Gill, Anne J. Gilliland, Maureen Whalen, and Mary Woodley -- is now available online and in hard copy. This is a very useful overview and it's nice to see it updated.

Thursday, September 18, 2008

generational myths

Siva Vaidhyanathan has a great article in The Chronicle Review entitled "Generational Myth." Siva and I first met through our online discussion of this topic -- I very strongly agree with him on this issue.

Lorcan Dempsey posted about a couple of blog posts by Andy Powell and Dave White about their takes on this issue. Dave's proposed "Resident" and "Visitor" categories and his acknowledgment of the spectra of behaviors that these categories represent is a well-considered take on how libraries might better understand styles of learning of distance students in particular. I'm obviously not a fan of human categorization -- people are notoriously hard to pigeonhole. But I think these are actually more akin to personas than categories, like those that you'd develop as an exercise when designing a new online service. Not unerringly accurate, but not without usefulness. It's certainly supplements the often simplistic thinking about our users as "faculty" or "graduate students" or "undergraduates" or "the public."

I also strongly recommend Janna Brancolini's blog Generation Underrated, her response to Mark Bauerlein's The Dumbest Generation: How the Digital Age Stupefies Young Americans and Jeopardizes Our Future (Or, Don't Trust Anyone Under 30). Check out what someone under 30 has to say, who also happens to be the daughter of a digital librarian.

grapes need a eula?

From Serious Eats, an image of an empty bag of grapes ... with a EULA.

The recipient of the produce contained in this package agrees not to propagate or reproduce any portion of the produce, including (but not limited to) seeds, stems, tissue and fruit.
To me this is particularly amusing because they're seedless grapes ...

Wednesday, September 17, 2008

djatoka

There is major buzz around the announcement that the Los Alamos National Laboratory Research Library has released djakota, a "reuse friendly" open source JPEG2000 Image Server. It's available on SourceForge under a GNU Lesser Public General License.

There's an excellent D-Lib article that fully describes the server. I love the first sentence of the article: "The Digital Library Research & Prototyping Team at the Los Alamos National Laboratory (LANL) enjoys tackling challenging problems." Now there's an understatement!

We did some explorations with Kakadu (one of the components of djakota) when I was at UVA, and we use Aware at LC. I plan to take a long, hard look at this.

smithsonian digitization initiative

There's an announcement on CNN that the Smitshonian plans to put its 137 million object collection online. The new Smithsonian Secretary G. Wayne Clough said in an interview that they do not yet know how long it will take or how much it will cost to digitize the full 137 million-object collection and will do it as money becomes available. A team will prioritize which artifacts are digitized first. They plan to focus on making the collections usable for the K-12 audience.

When I was at the Smithsonian yesterday for David Weinberger's talk, this seemed to be a buzzing topic of discussion among audience members; one Smithsonian employee even mentioned it in a question to Weinberger, expressing a certain level of surprise.

Tuesday, September 16, 2008

small pieces loosely joined by metadata



Today I had the extreme pleasure of attending a talk by David Weinberger (The Cluetrain Manifesto, Small Pieces Loosely Joined, Everything is Miscellaneous) at the Smithsonian, entitled "Knowledge, Noise and the End of Information." It was webcast, and I strongly suggest viewing it if you can.

There was lots of interesting discussion about the definition of information, the innately social nature of the human race and how social interaction is a vital aspect of information discovery, and how the loosely joined and messy nature of the internet just reflects human nature and is not a bad thing. He also stressed that one can never know what digital information might be of importance in the future, so we should, as cultural institutions, be striving to keep as much as possible. He also touched on the importance of brand and authoritativeness, but not to equate that with control.

A word I did not expect to hear today, let alone about a hundred times, was "metadata." The cellphone image above is a shot of one of his concluding statements. He talked a lot about the importance of metadata, whether it be authoritative cataloging, community tagging, or contextual relationships through linking. Since we cannot ever imagine all the uses for our digital content we cannot possibly expend the costly effort to provide all the descriptive metadata that every community might want or need, so all three are complementary and of equal value.

One of my take-aways was that this again shows the importance of just getting digital content out there. Let the content express itself through its authoritative metadata, but also provide open access and support multiple mechanisms through which it can be incorporated into new contexts and uses and gain new descriptions.

Monday, September 15, 2008

open access to museum collections

Last Friday there was a post on Open Access News that Wake Forest University's Anthropology Museum had issued a press release about the launch of its online collections, supported by an IMLS grant.

I welcomed this news on many fronts -- there aren't enough ethnographic or archaeological collections online; the museum is using Re:discovery, a great product geared toward small museums; and I have a number of friends with ties to Wake Forest and I've visited Winston-Salem many times and have a fondness for the area.

What made me sit down to think about this for a few days was the passing description of this an an Open Access project.

I worked for many years in the museum community, and every museum that I ever worked for or consulted for wanted to make its collections available in one digital form or another. The Museum Computer Network was founded in 1967 to enable museums to automate their processes and convert collections records to digital form. Museums were among the earliest institutions to share their collections online in the mid 1990s. The University of California Museum of Paleontology had a web site in 1994. The Fine Arts Museums of San Francisco brought their Thinker "imagebase" online in 1996 -- and they had volunteers assist with an early form of experimental user supplied subject metadata. e.g., proto-tagging. By 1997 the National Gallery of Art provided access to over 100,000 objects in its collection, and the Los Angeles County Museum of Art experimented with converting print museum catalogs into freely available online publications.

Sure, there have been lengthy discourses about levels of access to the digital media surrogates and questions of rights and control of those new media assets, and there is some information about the acquisition of objects that's subject to privacy restrictions, but no museum wants to limit discovery of their collections -- they want to facilitate their collections' use in research and teaching.

I've just not heard it described as "open access" before.

I'm not saying that it isn't a sort of open access initiative -- it most obviously is -- but I just think of it as such a normal museum activity I don't categorize it in my mind as anything other than business as usual. Then it hit me -- for the past 15 years museums have been major players in the open access movement without necessarily always knowing it.

Labeling this an open access initiative re-contextualizes this core museum activity into a different realm -- one that I hope will make museum collections information more visible and reinforce the importance of all categories of open access content.

Friday, September 12, 2008

nsdl metadata registry

This afternoon a group of us had the opportunity to sit down with Jon Phipps, implementer of the NSDL Metadata Registry.

I knew that such a thing existed. I understand RDF. I know about SKOS. I hadn't really given a lot of thought as to how to best take advantage of it.

Today, I had one of those skies are opening and angels are singing from on high moments. RDF can be used to model relationships between concepts and potentially enforce them through schemas. This can obviously be applied to improve discoverability when a hierarchical taxonomy is employed. Then my LC colleague Clay Redding said that he was experimenting with multiple schemas and managing additional local alternative labels in addition to authoritative preferred labels. And then Jon and Ed Summers mentioned the potential for this tool to map across schemas. My a-ha moment was understanding the potential for formalized mappings across metadata schemas to improve discoverability within and across collections described with hetreogenous taxonomies and vocabularies.

I remember using Chenhall's Nomenclature in records for ethnographic objects where we recorded every level of the hierarchy in its own field -- It was madness. I remember when we were in the early days of the AAT, busily submitting new terms and building the hierarchies, our dream was searching for "case furniture" and getting results with bookcases, chests, desks, wardrobes, and every semantic child where "case furniture" never appeared in the record.

I remember some research at USC in the late 1990s about thesaurus-enabled searching. OCLC's Metadata Switch project has done some work in cross-schema mapping. I know this is very difficult to accomplish. Today was the first time I saw a tool that might make the conceptual mapping simpler. But not simple. This is a potentially massively overwhelming task if it can't be done programmatically to a large extent.

I'm coming late to the party, but now I'm really intrigued by what might be accomplished in this arena.

Tuesday, September 09, 2008

Cory Doctorow book of essays

I am a big fan of Cory Doctorow's writing -- his fiction and his essays on technology, rights, and privacy. Via BoingBoing, comes word of his new book of essays -- Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future.-- which he is making available as a free Creative Commons licensed PDF download.

I you haven't read Cory Doctorow yet, you should. I don't always agree with everything he says, but he is thoughtful and technologically savvy and writes thorough essays on very relevant topics in an entertaining style.

I've read some of these essays before, but having them together in one beautifully-designed volume that I can always refer to is the proverbial good thing.

LoC Repository Development Group hiring

Our group has a position open. Visit the LoC jobs page and search for posting "080214". The posting does not mention our unit specifically, so this is a head's up that the job is with us. We're still a relatively new group, working on a variety of projects with many units across the Library and developing our group's role in the institution.

The application period closes on October 3, and that is an absolute deadline. You must apply using an online federal job application system -- it's a lengthy form that requires some time to fill out. Be prepared with electronic copies of your documents to cut-and paste.

EDIT (9/24/2008): This position reports to the Director of the Repository Development Group. Everyone in the team -- including me -- reports to the Director. There is no additional management structure.

Monday, September 08, 2008

google newspaper digitization

Google is digitizing newspapers.

Not only will you be able to search these newspapers, you'll also be able to browse through them exactly as they were printed -- photographs, headlines, articles, advertisements and all.

This effort expands on the contributions of others who've already begun digitizing historical newspapers. In 2006, we started working with publications like the New York Times and the Washington Post to index existing digital archives and make them searchable via the Google News Archive. Now, this effort will enable us to help you find an even greater range of material from newspapers large and small, in conjunction with partners such as ProQuest and Heritage, who've joined in this initiative. One of our partners, the Quebec Chronicle-Telegraph, is actually the oldest newspaper in North America—history buffs, take note: it has been publishing continuously for more than 244 years.

You’ll be able to explore this historical treasure trove by searching the Google News Archive or by using the timeline feature after searching Google News. Not every search will trigger this new content, but you can start by trying queries like [Nixon space shuttle] or [Titanic located]. Stories we've scanned under this initiative will appear alongside already-digitized material from publications like the New York Times as well as from archive aggregators, and are marked "Google News Archive." Over time, as we scan more articles and our index grows, we'll also start blending these archives into our main search results so that when you search Google.com, you'll be searching the full text of these newspapers as well.
It's interesting that they're working directly with publishers and with aggregators such as ProQuest to digitize and improve discoverability of back files. That's good news, but do they also plan to work with major newspaper open access projects such as the National Digital Newspaper Program? Are they digitizing any collections in addition to publisher collections?

When I last looked at the Google news archive in September 2006 I found that way too much of the content was pay-per-view, made you pay even if your institution had licensed subscription access, and didn't work with OpenURL resolvers. I don't see that any of that has changed. I hope it will.

vintage museum photos

Via BoingBoing, check out these fabulous vintage photos from the American Museum of Natural History. I love dioramas, and the exhibit installation images are just great. Taxidermy mounting, diorama background painting, articulating dinosaur bones, casting animal models ... And the vintage exhibitions! I love the images of earnest children being led around ... and doing so-called Indian dances in their construction paper bonnets. State-of-the-art, 1900s-1970s.

Saturday, September 06, 2008

ambient awareness

This week's New York Times Magazine has a piece by Clive Thompson that explores issues around ambient awareness and privacy. Facebook, twitter, flickr, dopplr, and texting and blogging more generally. Is it narcissistic to broadcast your status using awareness tools? Are these tools to improve connectedness in a more mobile and global human ecology -- the ultimate tools for building and maintaining relationships?

This is the paradox of ambient awareness. Each little update — each individual bit of social information — is insignificant on its own, even supremely mundane. But taken together, over time, the little snippets coalesce into a surprisingly sophisticated portrait of your friends’ and family members’ lives, like thousands of dots making a pointillist painting. This was never before possible, because in the real world, no friend would bother to call you up and detail the sandwiches she was eating. The ambient information becomes like “a type of E.S.P.,” as Haley described it to me, an invisible dimension floating over everyday life.
...
And when they do socialize face to face, it feels oddly as if they’ve never actually been apart. They don’t need to ask, “So, what have you been up to?” because they already know. Instead, they’ll begin discussing something that one of the friends Twittered that afternoon, as if picking up a conversation in the middle.
An interesting section focuses on the so-called "Dunbar Number" -- just how many people can you be "friends" with, anyway? According to anthropologist Robin Dunbar, about 150. Can you max out on social connectedness? Not really, since many of one's ambient connections are weak ties, not close, intimate friends. But weak ties are just an important part of social and professional networks.

I find it useful to check in on my Facebook account and see the status newsfeeds of my friends and colleagues. I have also personally met all but a handful, and I believe that they are controlling their feeds and filtering what they write in their status that maintains their chosen levels of privacy. I keep my status updated. I blog, and I know and expect that people who have never met me read it. But is the ability to follow personal newsfeeds and tweets of people you will never know a creepy invasion of privacy, making it too easy to develop parasocial relationships? Or is it all just part of ubiquitous ambient awareness where participation is increasingly not optional?

I originally refused to blog or join Facebook because I thought it was vain to assume that anyone wanted to know what I was thinking or doing, and that I'd be giving up my privacy. OK, I have given up some of my privacy, but I've also made new connections I might never have otherwise, re-established relationships that had gone dormant, and built stronger ties with geographically disparate friends. While I'm not willing to give up my privacy for a free cup of coffee, I am willing to give up some privacy to to that.

Wednesday, September 03, 2008

HathiTrust

The University of Michigan has announced that their MBooks initiative has grown into a shared repository effort called the HathiTrust (pronounced hah-TEE).

HathiTrust was originally a collaboration of the thirteen universities of the Committee on Institutional Cooperation (CIC) to establish a repository for those universities to archive and share their digitized collections. All content to date has been supplied by the University of Michigan and the University of Wisconsin, and Indiana University and Purdue University will soon be contributing their digital materials. 20% of its current content is open access and 80% is restricted. Don't look for a single search interface yet -- it's planned. As they say: "Good, useful, technology takes time.... and the strength and insight born of collaborative work."

The new HathiTrust initiative has been funded for an initial five-year period beginning January 2008, and is now open to other institutions. Partners will be charged a one-time start-up fee based on the number of volumes added to the repository, in addition to an annual fee for the curation of the data. They already support both open access and dark archive materials, and will also do so for new partners.

Their July 2008 monthly report gives a good sense of their activities. It is interesting to note that the only initial ingest workflow supported is the Google partner workflow. That's not too surprising since this work is based on the MBooks project developed in support of Google content workflows. That in and of itself ensures that there many institutions who'll be considering partnership.

This announcement is exceptionally exciting. I look forward to its development as a service.

Monday, September 01, 2008

kete 1.1

I've blogged about Kete before - version 1.1 has been released.

From the announcement:

Kete 1.1 is now available with a giant helping of new features and improvements. This is also the first release where you can grab Kete from our code repository's new home at Github.com. See http://kete.net.nz/site/topics/show/25-downloads for details or browse the code online at http://github.com/kete/kete/.

For those who haven't seen Kete in action, Kete is open source software that enables communities, whether the community is a town or a company, to collaboratively build their own digital libraries, archives and repositories. Kete combines features from Knowledge and Content Management Systems as well as collaboration tools such as wikis, blogs, tags, and online forums to make it easy to add and relate content on a Kete site. You could create a service like Google's Knol for your community using Kete.

An in-depth list of features and issues resolved can be found at http://kete.net.nz/documentation/topics/show/182-kete-11-features-and-bug-fixes , but here are some highlights:

Saturday, August 30, 2008

web archiving

The Library of Congress has a phenomenal Web Capture team, staffed with very dedicated people who take a lot of effort to identify web sites that best document an event, crawl and capture sites through partner Internet Archive, work with cataloging to get the sites described to enhance discoverability, do quality control to make sure the archived sites will run correctly, and then make the archived sites live for public access. This process can take a very long time to ensure that a site is fully captured, preserved, and accessible.

The Web Capture team is, as they have with previous years, documenting the 2008 elections. They don't crawl sites without permission, and they always send requests. A colleague at another library sent me a link to a post and series of comments on Wonkette that were a reaction to a LoC request to capture the site. The post itself is fine. It is more than a bit surreal to get such a request from LoC -- they're going to collect what I write? -- and making fun of it is OK.

Some of the comments, however, are another story.

The reaction to the notice of the request includes strings of profanity, vulgarity, and various exhortations to "archive this, LoC!" Some comment that it's possibly a fake request similar to a Nigerian scam, some liken it to FBI wiretapping, and one comment says that it's a waste of taxpayer dollars to have federal employees reading websites in order to identify what should be archived. One comment conjectures that by "capture," we mean print out the site and store it in a box next to the Ark of the Covenant. Some of the comments are obviously humorous and some are serious, and it's hard to tell with others.

I have a sense of humor, especially about political topics. Of course it's funny to the Wonkette participants that whatever is said, whether profound or mundane or profane, the Library of Congress will crawl it. I remember my own reaction when I was approached about submitting my email to the MCN archives covering the period when I was on its board, which contained such highlights as "The membership brochure is at the printer" and "Don't faint when you see how much the conference hotel wants to charge us for internet access." But for some reason this really struck a nerve because some of the commenters were so "f--- you, Library of Congress." That saddened and angered me.

It's a huge effort to collect ever-changing interactive born-digital resources compared to print materials, but we and many others libraries do it because it's an equally important form of publishing. Libraries collect whatever is relevant regardless of their form of publication. Sites like these are important because they reflect what's really being said and what people really think about the political process. What about that isn't worth collecting?

I'll cop to being a bit overly sensitive on this, but only because I place very high value on such collecting activities.

Friday, August 29, 2008

the omnivore's 100

The blog Very Good Taste has come up with a list of 100 items that every omnivore should try in his or her life. Not surprisingly, it has turned it into a meme that I found through Serious Eats. Basically, you copy the list from Very Good Taste's The Omnivore's 100 and post it to your blog, bolding the items you've tried and striking through any you would never try.

1. Venison
2. Nettle tea
3. Huevos rancheros
4. Steak tartare
5. Crocodile [I've had alligator on a number of occasions -- would that count?]
6. Black pudding [I'll eat it but I don't seek it out]
7. Cheese fondue
8. Carp
9. Borscht
10. Baba ghanoush
11. Calamari
12. Pho
13. PB&J sandwich
14. Aloo gobi
15. Hot dog from a street cart
16. Epoisses
17. Black truffle
18. Fruit wine made from something other than grapes
19. Steamed pork buns
20. Pistachio ice cream
21. Heirloom tomatoes
22. Fresh wild berries
23. Foie gras [I've never understood foie gras worship]
24. Rice and beans
25. Brawn, or head cheese [my mother loved it but couldn't convince me to eat it growing up]

26. Raw Scotch Bonnet pepper [there was the incident with some peppers past their prime, the disposal, and the resultant evacuation of my kitchen]
27. Dulce de leche
28. Oysters
29. Baklava
30. Bagna cauda
31. Wasabi peas
32. Clam chowder in a sourdough bowl
33. Salted lassi
34. Sauerkraut
35. Root beer float

36. Cognac with a fat cigar
37. Clotted cream tea
38. Vodka jelly/Jell-O
39. Gumbo
40. Oxtail
41. Curried goat

42. Whole insects
43. Phaal
44. Goat’s milk
45. Malt whisky from a bottle worth £60/$120 or more
46. Fugu
47. Chicken tikka masala
48. Eel
49. Krispy Kreme original glazed doughnut
50. Sea urchin [I don't much care for it, and Bruce is allergic to it]
51. Prickly pear
52. Umeboshi
53. Abalone
54. Paneer
55. McDonald’s Big Mac Meal
56. Spaetzle
57. Dirty gin martini
58. Beer above 8% ABV
59. Poutine
60. Carob chips
61. S’mores
62. Sweetbreads
[not a favorite, but I will eat them if I know the preparation will be excellent]
63. Kaolin
64. Currywurst
65. Durian
66. Frogs’ legs
67. Beignets, churros, elephant ears or funnel cake
68. Haggis
69. Fried plantain
70. Chitterlings, or andouillette [are you noticing a trend that offal is a category I don't much care for?]
71. Gazpacho
72. Caviar and blini
73. Louche absinthe
74. Gjetost, or brunost

75. Roadkill
76. Baijiu
77. Hostess Fruit Pie
78. Snail [only once, when a donor at an event handed it to me and I felt I had to eat it]
79. Lapsang souchong
80. Bellini
81. Tom yum
82. Eggs Benedict
83. Pocky
84. Tasting menu at a three-Michelin-star restaurant [a 2-star, yes, not yet at a 3-star]
85. Kobe beef [actually, wagyu, but I'm counting it]
86. Hare
87. Goulash
88. Flowers
89. Horse
90. Criollo chocolate
91. Spam
92. Soft shell crab
93. Rose harissa
94. Catfish
95. Mole poblano
96. Bagel and lox
97. Lobster Thermidor
98. Polenta
99. Jamaican Blue Mountain coffee
100. Snake [I had iguana once in Mexico ...]

Wednesday, August 27, 2008

dead sea scrolls

When I was growing up, my mother had a small selection of books displayed between decorative bookends on her coffee table -- a set of 4 art history overview volumes with high quality color reproductions on glossy paper, and a book on the Dead Sea Scrolls. I was fascinated by the volume on ancient art and the book on the scrolls because of their sheer antiquity. I don't remember there being many illustrations in the book, but the story of the discovery of the scrolls was a very engaging one. I don't remember every asking my Mom why she that volume on display, or, if I did, what her answer was.

The New York Times today reports on the project to digitize the Scrolls. It's interesting to read that they plan to create new digital images, as well as digitizing the infrared images created of the scrolls in the 1950s.

Tangentially, there was an article in The Australian a couple of week ago about the conservation and multi-spectral imaging of scrolls from the Villa dei Papyri at Herculaneum.

Tuesday, August 26, 2008

Executive Director of OCA named

A press release went out tonight naming Maura Marx -- founder of the Digital Library Program at the Boston Public Library -- as the first Executive Director of the Open Content Alliance.

“Maura's background in working both inside and outside the library system will help her communicate with a broad public audience the shape of the new public library services in this digital age." said Brewster Kahle, Digital Librarian of the Internet Archive. “Her dynamic style, deep-seated commitment to open principles, and demonstrated success at implementing partnerships and initiatives in the digital space will be a powerful combination in taking the OCA to the next level.”
I met Maura at a meeting this spring, and I know that she's an excellent choice!

Monday, August 25, 2008

concordance as word cloud

Eric Lease Morgan posted about a cool little hack to present a text concordance as a word cloud. A visualization of a concordance -- what a nice idea! It would be interesting to see one at a larger scale -- for every word in a book. I'd like to see how the visual metaphor scales.

Eric said one thing, though, that gives me pause:

"It is a trivial example of how libraries can provide services against documents, not just the documents themselves."

He is absolutely right -- it is a trivial effort to create this useful service. What is still unfortunately not as trivial as it should be is getting access to accurate transcriptions of all the texts once might want to analyze. There are ascii transcriptions for many, many works, but there is always a question of accuracy, and if the desired edition(s) are available. There's OCR, but it's a fair amount of effort to check and correct the output. Google isn't releasing its OCR, but even if they did that, too needs correction. Keyboarding is expensive. And many works in copyright haven't been touched for fear of legal action.

We have the ability to build extraordinary analytical tools. Where is the critical mass of text content?

Mickey Mouse copyright

Via Techdirt and the L.A. Times, an interesting overview on the copyright status of Mickey Mouse. The Virginia Sports and Entertainment Law Journal article by Douglas Hedenkamp mentioned is available online through the "Opposing Copyright Extension" site, as is the original student work by Lauren Vanpelt.

vintage tech

I'm a sucker for vintage technology and vintage manuals. I have a small collection of the latter. I'm a big fan of The Computer History Museum in Mountain View, California. I love to read books about the history of computing.

The Alameda County Computer Resource Center (ACCRC) in Berkeley, California is a non-profit group that recycles hardware. The ACCRC has launched the blog "It Ain't Dead Yet" to showcase their more unusual finds, partly to share the wonder and partly to gauge the usefulness and value of the items. Now there's a feed I'm sure to read every day!

Wednesday, August 20, 2008

Registry of U.S. Government Publication Digitization Projects

I didn't know the Registry of U. S. Government Publication Digitization Projects existed:

"The Registry contains records for projects that include digitized copies of publications originating from the U.S. Government. It serves as a locator tool for publicly accessible collections of digitized U.S. Government publications; increases awareness of U.S. Government publication digitization projects that are planned, in progress, or completed; fosters collaboration for digitization projects; and provides models for future digitization projects."

The Registry has recently been updated, and they welcome additions. Institutions need to apply to contribute.

Ithaka's 2006 Studies of Key Stakeholders in the Digital Transformation in Higher Education

Ithaka has recently released the full findings from their 2006 surveys of the behavior and attitudes of faculty members and academic librarians. The faculty study focuses on the relationship between faculty and the library, faculty perceptions and uses of electronic resources, the transition from print to electronic journals, faculty publishing preferences, e-books, digital repositories, and the preservation of scholarly journals. The librarian survey complements the faculty study, exposing the similarities and differences between faculty and librarian views of key topics.

This is an extended quotes describe two very interesting key perceptual differences:

Over the course of these three surveys, we have tested three “roles” of the library – purchaser, archive and gateway. We have attempted to track how the importance of these three different roles has changed over time. Most highly rated among these roles is that of library as purchaser – faculty don’t want to have to pay for scholarly resources, a finding which holds across disciplines and has remained stable over time. There is slightly more variation by discipline in views on the importance of the library’s preservation function, but valuation of this role is also uniformly high and has remained static over time. The importance of the role of the library as a gateway for locating information, however, varies more widely and has fallen over time.

The declining importance assigned to the gateway role is cause for concern in general, and especially when considered by discipline. The importance to faculty of this role has decreased across all disciplines since 2003, most significantly among scientists. While almost 80% of humanists rate this role as very important, barely over 50% of scientists do so. Beyond the differences between these general disciplinary groups, there also exist substantial variations by individual discipline, as demonstrated by the perceptions of economists. Between 2003 and 2006, the percentage of economists indicating they found the library’s gateway role to be very important dropped almost fifteen percentage points. In 2006, the percentage of economists who believed this gateway role to be very important was actually below the average level of scientists, falling to 48%.

The decreasing importance of this gateway role to faculty is logical, given the increasing prominence of non-library discovery tools such as Google in the last several years. Since 2003, the number of scholars across disciplines who report starting their research at non-library discovery tools, either a general purpose search engine or a specific electronic resource, has increased, and the number who report starting in directly library-related venues, either the library building or the library OPAC, has decreased. Despite the rising popularity of tools like Google, overall, general purpose search engines still slightly trail the OPAC as a starting point for research, and are well behind specific electronic research resources. This overall picture, however, hides a number of variations by discipline; scientists typically prefer non-library resources, while humanists are more enthusiastic users of the library.

The declining importance of this role to faculty stands in stark contrast to the perceptions of librarians, as shown by our 2006 librarian survey. Although the importance of the library’s role as a gateway to faculty is decreasing, rather dramatically in certain fields, over 90% of librarians list this role as very important, and almost as many – only 5 percentage points less – expect it to remain very important in 5 years. Obviously there is a mismatch in perception here.

Librarians at all sizes of institutions see this gateway role as among their primary goals; this, along with the licensing of electronic resources and maintaining a catalog of their resources, are by far the roles most broadly considered important. They expect most of the roles of the library to rise in importance, or at least hold steady, over the next five years, with some notable exceptions to be found in roles focused on nondigital materials, such as roles relating to traditional print preservation and the maintenance of a local print journal collection, which are expected to decline in importance. There are some variations by institution size. Several roles, most notably the development and maintenance of special collections and several more technical tasks such as the management of datasets, are significantly more important at larger libraries than smaller ones. And unlike smaller libraries, larger libraries view licensing as their single most important activity, with less emphasis put on the gateway and catalog roles. This may be a sign that leading-edge libraries are beginning to change their priorities to match those of faculty and students. Still, the mismatch in views on the gateway function is a cause for further reflection: if librarians view this function as critical, but faculty in certain disciplines find it to be declining in importance, how can libraries, individually or collectively, strategically realign the services that support the gateway function?

... and

Perceptions of a decline in dependence are probably unavoidable as services are increasingly provided remotely, and in some ways these shifting faculty attitudes can be viewed as a sign of library success. One can argue that the library is serving faculty well, providing them with a less mediated research workflow and greater ability to perform their work more quickly and effectively. In the process, however, they may be making their own role less visible. This indicates a challenge facing libraries in the near future – as faculty needs are increasingly met without the direct intermediation of the library, the importance of the library decreases. Libraries must consider ways which they can offer new and innovative services to maintain, or in some cases recapture, the attention and support of faculty.

Read their full white paper, or review the raw data from the faculty survey or the librarian study.

Tuesday, August 19, 2008

is everything moving into the cloud?

There's an essay entitled "The Future of the Desktop" by Nova Spivack of Twine on ReadWriteWeb. It's a pretty thoughtful opinion piece on the trend where users are moving away from desktop applications towards Web-hosted ones that run in browsers.

He mentions something that I think is vital: everyone has a sense of the personal and "mine," so there has to be some sort of place that each of us can consider to be our "home." He rightly declares that it's not going to live in any one location or on any one device. His "Webtop" paradigm is that instead of launching the browser from the desktop, one would launch the "desktop" from the browser, and that desktop is the personal location where we do our work and interact with the world.

I'm not sure that I fully buy his metaphor that we'll give up being "librarians" ("filing" and managing resources) and fully become "daytraders" (discovering, filtering, and monitoring of trends), in part because search will replace the need to "file" things.

For one, librarians _actually_ do all of the above, but I'm not going to fault him just because he doesn't know what librarians do in their jobs.

What I'm having trouble with is the notion that just because we're working in the cloud we'll stop organizing resources. The "search will replace cataloging" argument that we've heard in libraries is one that I can't buy. Search doesn't work worth a damn if there isn't some level of organization and filing, aka metadata or cataloging. How will these daytraders efficiently filter what they discover and note trends if they aren't organizing and filing? It is true that we'll be managing fewer files _locally_, but we'll be organizing even more files in the cloud. He rightly identifies that there will be more shared, social spaces, and he says that communities will "seamlessly and collectively add, organize, track, manage, discuss, distribute, and search for information of mutual interest." Maybe it's a semantic distinction, but to me that's a resource management activity, just in a much larger and more social realm.

And ah, the dream of semantic search. And the dream of the smart webtop or desktop, where context is easily understood and parsed for data coming in and being queried. I want to believe. I'm waiting.

Where I do buy into the cloud is from a standpoint of portability. Even moving between work and home on two machines, I have found myself storing and organizing more of my resources out there rather than in here. Flickr. Delicious. Bloglines. LibraryThing. Web mail. It would waste more time than I could imagine to keep my life in sync between just two locations, let alone more.

I worry about security and preservation a lot. I lost my home desktop PC drive last year. What if that drive I lost was my only copy (it wasn't) AND flickr suffered a catastrophic failure? There goes the documentation of the past three and a half years of my life. As someone whose career is centered on digitization and management and use of digital files, I have been trained through experience to think in terms of the catastrophic. And to think about rights and ownership. The cloud must become more secure, aware of identities, distributed, and replicated in its file management to assuage my concerns before I fully buy in.

digital is not to blame

I just read a Wired essay entitled "The Critics Need a Reboot. The Internet Hasn't Led Us Into a New Dark Age." In one of those great moments in synchronicity, over the weekend I started reading a blog written by the daughter of a colleague: Generation Underrated. She was spurred to blog as a reaction to Mark Bauerlein's The Dumbest Generation: How the Digital Age Stupefies Young Americans and Jeopardizes Our Future (Or, Don't Trust Anyone Under 30), which is also mentioned in the Wired essay. I haven't read the book, but I am reasonably sure that it would make me crazy to do so. From Janna Brancolini's blog, referring to studies noted in the first chapter of Bauerlein's book:

A test was given to high school seniors in 1955. The same questions appeared on a Gallup survey given to college seniors in 2002. The college seniors in 2002 scored no better than the high school seniors had in 1955. (29)

In other words, the first chapter doesn’t given a single statistic that demonstrates that people under 30 know less than previous generations, either now or when the members of those generations themselves were under 30.

Bauerlein acknowledges this lack of empirical comparisons by saying, “Even if we grant the point that on some measure today’s teenagers and 20-year-olds perform no worse than yesterday’s, the implication critics make seems like a concession to inferiority. Just because sophomores 50 years ago couldn’t explain the Monroe Doctrine or identify a play by Sophocles any more than today’s sophomores doesn’t mean that today’s shouldn’t do better, far better.”

Janna greatly simplifies Bauerlein's argument thusly:
In a nutshell: we’re dumb because we don’t know anything, we’re dumb because we’re letting the Internet and cell phones be used for evil instead of good, and we shouldn’t be dumb since we have so much technology available to combat our overwhelming dumb-ness.
From the Wired essay:
But the latest crop of curmudgeons fail to acknowledge that there is not much new in this parade of the preposterous. The US has a long and colorful history of being taken in by the erroneous and irrational: Salem witches, the "War of the Worlds" radio broadcast, phrenology, and eugenics are just a few choice examples. The truth is that Americans often approach information — online and off — with a particular mindset. "Antirational junk thought has gained social respectability in the United States during the past half century," notes Susan Jacoby in The Age of American Unreason. "It has proved resistant to the vast expansion of scientific knowledge that has taken place during the same period." Jacoby argues that long-standing American values like rugged individualism and the need to question authority have metastasized into reflexive anti-intellectualism and disdain for "eggheads," "elites," and pretty much anyone who might be described as credentialed. This cancerous irrationalism isn't pretty, but it isn't technology's fault, either.
Readers of this blog know that I have a very negative reaction to generalizations like "digital generation" and "digital natives." In the same vein, blaming technology and saying that this generation is the dumbest seems ludicrous to me. IF we accept that this is the dumbest generation (and I don't), there are other places to identify causation/lay blame. Underfunded school systems with a focus on standardized testing rather than critical thinking. The self-esteem movement, which, when taken to extremes, does away with competition and realistic assessment. But technology? Please. There is increased ubiquitousness of technology and media access (and increased media targeting of younger consumers), but its use or lack of use hasn't made an entire generation less educated.

Friday, August 15, 2008

Patry restoring old posts

William Patry has decided to restore many of his posts which he deleted when he closed down his blog. He has been laboriously identifying the posts and plans to restore them very soon.

Red Island Repository Institute

This week the Red Island Repository Institute took place, with a week-long immersion in all things Fedora. The instructors were Sandy Payette, Richard Green, and Matt Zumwalt. It would be hard to think of people who are a better choice to teach the institute besides these three!

Powerpoints from presentations are online, and they provide a great overview of Fedora.

Thursday, August 14, 2008

free copyright licenses upheld

Great news from Larry Lessig:

I am very proud to report today that the Court of Appeals for the Federal Circuit (THE "IP" court in the US) has upheld a free (ok, they call them "open source") copyright license, explicitly pointing to the work of Creative Commons and others. (The specific license at issue was the Artistic License.) This is a very important victory, and I am very very happy that the Stanford Center for Internet and Society played a key role in securing it. Congratulations especially to Chris Ridder and Anthony Falzone at the Center.

In non-technical terms, the Court has held that free licenses such as the CC licenses set conditions (rather than covenants) on the use of copyrighted work. When you violate the condition, the license disappears, meaning you're simply a copyright infringer. This is the theory of the GPL and all CC licenses. Put precisely, whether or not they are also contracts, they are copyright licenses which expire if you fail to abide by the terms of the license.

Important clarity and certainty by a critically important US Court.

Wednesday, August 13, 2008

LibraryThing covers

Last week LibraryThing announced that they were making a million free book covers available. A LibraryThing Developer Key is required, which any LibraryThing member can get.

There are some rules:

  • Retrieve no more than 1,000 cover per day.
  • If covers are fetched through an automatic process (e.g., not by people hitting a web page), you may not fetch more than one cover per second.
  • Do not make LibraryThing cover images available to others in bulk. You may cache bulk quantities of covers.
  • Use must not involve or promote a LibraryThing competitor.

Tim Spalding admits that this service competes with Amazon web service and other commercial vendors, but LibraryThing’s Terms of Service are far more open.

After the announcement I wondered how this was legally possible for such a large number of covers since there are so many variations of rights regarding cover designs. Who holds the rights? The publishers? The designers? Third parties? It's likely it's a wide variety of all of the above. Should we start talking about orphan work book cover designs?

Yesterday Mary Minow posted about this at LibraryLaw Blog. She posits an interesting possibility that this could fall under section 113. Read the comments for more discussion from Peter Hirtle about whether this might also be transformative use of thumbnails that could possibly be covered under fair use. Peter also rightly mentions the market for cover images, since effect on the market is one of the tests for fair use.

Tuesday, August 12, 2008

Aurora

Mozilla Labs is sponsoring a Concept Series -- "... a forum for surfacing, sharing, and collaborating on new ideas and concepts. Our goal is to bring even more people to the table and provoke thought, facilitate discussion, and inspire future design directions for Firefox, the Mozilla project, and the Web as a whole."

The first featured concept is Aurora, from Adaptive Path, a vision for the future of browsers and the web. This isn't a product, it's a visualization of an interactive 3-D navigational paradigm tied to ideas about personalization, authentication, and mobility of a user's preferences, history, and context.

It's worth looking at. There's a quick guide to Aurora's interface and a descriptive concept document.

Sunday, August 10, 2008

on the mastering of new technologies

Dorothea wrote a post to which my only reply can be "Amen, Sister!" She references a great post by Steve Lawson.

I also feel that I'm at the upper end of the technological middle ground. I'm a journeyman scripter and not really a programmer. I still have digital content production chops. My XML markup skills and metadata fu are strong. I can tell you a lot about the inner working of Fedora. I can haul out my atrophying JavaScript, SQL, and Perl skills, and dredge up my minimal PHP skills. I fondly remember my ColdFusion days, and my days employing Lingo in Director to create Shockwave apps.

I'm not really up on the tools that are all the rage these days -- Python, Ruby, Django, or even Java. I've been so focussed on managing projects that I've lost some of my technological edge. Where do I go to regain/retain it? Especially since I am no longer following a path where I spend any time writing any forms of code. I am often asked why I don't go to code4lib -- it's not exactly that I feel over my head, but I'm just not doing the hands-on thing anymore and I don't think there's much I can contribute to a conversation about Python libraries or Lucene optimization.

I'm a fan of DLF Forums. I learn a lot about what tools folks at other institutions are using. But I don't always learn enough about why they use them and what those tools are especially good for. Something I can take back to my own projects and say "Hey, let's consider this solution because it's a great fit for XYZ."

There are things I need to learn about at a pretty deep technical level, but I may never personally apply them. Where do I go for that?

And circling around to another of Dorothea's topics ... I am still too often one of the few women in the room. A recent event I attended had 40 men and 4 women. But then, I am often just the sort of woman who thinks she's not technical enough to attend such events. Perhaps I need to face my own wariness about events like code4lib and just go. And/or stand up alongside others of my kind and start another type of event.

Friday, August 08, 2008

OpenCollection

Via Digital Koans, I came across an open-source collection management systems called OpenCollection. from their site:

OpenCollection is a full-featured collections management and online access application for museums, archives and digital collections. It is designed to handle large, heterogeneous collections that have complex cataloguing requirements and require support for a variety of metadata standards and media formats. Unlike most other collections management applications, OpenCollection is completely web-based. All cataloging, search and administrative functions are accessed using common web-browser software, untying users from specific operating systems and making cataloguing by distributed teams and online access to collections information simple, efficient and inexpensive.

...

OpenCollection is intended as an alternative to expensive proprietary software solutions that have traditionally been used for collections cataloguing and publishing by museums, archives, libraries and other organizations.
Having worked for many years in the museum community and had responsibility for the design, care and feeding of a number of collection management systems, this is pretty stripped down. It has very strong support for the linking of media files. At first I thought it was lacking elements to manage those fiddly details that were so ubiquitous in managing physical collections -- storage location, exhibition history, publication history, valuation, insurance, condition -- but once I created a test object through the basic entry screen the other screens became visible to me. The only thing I didn't find (but may have missed) are elements having to do with packing and shipping, which requires very detailed record keeping. I would have also expected to see more on condition, such as the ability to track a treatment history and document treatment, since there are professional record keeping requirements for conservators.

One annoyance -- the OpenCollection product site has this ribbon of images that kept crossing on top of the text and blocking it. I'm not even using Firefox 3.0, so who knows what caused this.

issues with blogger?

Has anyone else noticed any odd Blogger behavior? When I put up a new post I'm not seeing it on my blog site for a while, sometimes not until the next day. I first noticed it on July 29. The really odd bit it that I don't see what I just posted on my blog index page, but if I click on the current month in the archive I _do_ see it. I didn't worry about it too much until today a colleague in my department said that he noticed it, as well as for some other sites. The feeds are working fine, but the sites are not.

I thought I'd ask if anyone else has seen anything like this before we lay the blame on our IT environment.

i am rich

The brouhaha over the $999 "I am Rich" iPhone app is very amusing. Eight people bought "I am Rich" -- which presents a glowing, animated red jewel -- before Apple pulled it from the App Store. Is it a scam? Conceptual art? Just something funny to do if you've got the money to burn? Is it an apocryphal story that one of the purchasers didn't mean to buy it?

The best article I found was at the Los Angeles Times. Read it quick before it disappears behind a wall ... There's also this posting on Silicon Alley Insider and this article in the Times.

Wednesday, August 06, 2008

time for links and nothing more

I'm really swamped these days, and only have time to post some links to things that caught me eye during the past week:

vi.sualize.us seems like a really interesting social bookmarking tool for images. Perhaps they'll learn what delicious learned and give up the tortured . 's.

William Patry stopped blogging
. I'm not surprised if folks thought his personal blog was the word of Google. It's sad that he also decided to erase his archives, but I understand that he didn't want his past postings to live on and continue to be misunderstood.

It seems that Google is making some of its machine-translation technologies and translation management tools available to human translators, at least as a beta. I'm working with a project that requires translation into 7 languages. Managing this process is very challenging, and I've seen some very bad tools for the process.

Following the Digitization and the Humanities Symposium, Jennifer Schaffner and Merilee Profitt wrote a brief report, The Impact of Digitizing Special Collections on Teaching and Scholarship: Reflections on a Symposium about Digitization and the Humanities. The report acts as a summary of the symposium, and also gives some calls to action, especially about metrics for success.

Duke has launched its Open Library Environment Project with Mellon support. Its focus on back-end open tools is worhtwhile, but I'm not sure I know how this will be integrated with other activities in the community.