Digital Eccentric: publishing

Showing posts with label publishing. Show all posts

Sunday, March 01, 2009

copyright registries

I attended a great presentation by Siva Vaidhyanathan and James Grimmelmann at Georgetown University last Friday on the Google Book Search settlement. The question that I most wanted to raise during the discussion period (why did the facilitator never call on me?) was about their opinions on the proposed registry. This seems to me to be one of the topics most in need of clarification in the settlement.

I chatted with both of them afterwards. I worry about a potential lack of transparency of the registry's contents and its mode of operation. I have heard Dan Clancy from Google say that it will not be made fully publicly available.

While there a student from the University of Michigan School of Information mentioned Michigan's IMLS grant supported effort to create a Copyright Review Management System to increase the reliability of copyright status determinations of books published in the United States from 1923 to 1963. Last week Lorcan Dempsey was blogging about the OCLC Copyright Registry Evidence Initiative. Stanford has a Copyright Renewal Database. John Mark Ockerbloom at the University of Pennsylvania researched periodicals renewals in addition to posting scans from many volunteer institutions (including Carnegie Mellon's and Project Gutenberg's extensive work) in his Catalog of Copyright Entries. The U.S. Copyright Office has records from 1978 onward online.

So, where does a Library (or anyone, for that matter?) go to research the copyright status of a published work? One of these places? All of these places? And where might the ownership status of orphan works someday be researched and recorded and made public? What will be the most authoritative source? Will there be open resources and less open resources? This looks like an area where there might be too much competition, almost a splintering of attention that calls out for a sense of coordination in the community.

eReading

Via TeleRead, I found an essay about eReading devices by Jennifer Chapelle on treocentral. The piece, "Centro, iPhone, and that Other Reading Device (Kindle 2)," briefly describes her experiences with a Centro and an iPhone, focusing on the new Kindle 2.

Overall, she liked it. But she's not throwing away her other devices.

If you've ever been interested in getting an eReader type of device, I can definitely recommend the Kindle 2. It's not the cheapest gadget, but it does have a lot of features, and don't forget that 3G Sprint radio inside. If you want an eReader that is thin, lightweight, fast, looks great, has a built-in dictionary and a battery saving sleep-mode with some cool portraits, the Kindle 2 from Amazon is a great choice.

And if you don't care about those eReaders like the Kindle and the Sony device, just stick with your Treo or Centro. Those are great little eBook readers! And we know all the other great stuff you can do on them like talking on the phone, texting, writing documents, listening to music, taking photos, surfing the internet on decent looking web browsers, playing games, etc. My Centro and Treo Pro will be staying right by my side, Kindle or no Kindle.

I saw an interview with Jeff Bezos on Charlie Rose last week, which was primarily a discussion of the Kindle 2. My take-away is that the killer feature for the Kindle is the wireless purchasing of books that does not require a PC. Bezos is also a huge fan of the ability to bookmark your location in a text on your Kindle, and when you pick up another of your Kindles, the devices will sync up and you will find the same bookmark. Interesting, but I'm not sure I understand yet why you would have more than one. One at home and one at work? One downstairs and one upstairs? It's already portable. The functionality that they are working on where you can sync between your Kindle and a reader app on a cell phone and back interests me more. His example was reading on a cell phone while waiting in line at the grocery store, and having your Kindle aware of your new bookmark once you get home. That use case works better for me.

His statement that he wants to deliver "Every book ever in print in any language" gives me pause. That feels potentially monopolistic for the eBook distribution sector. Well, at least for their proprietery AZW ebooks. But if theirs becomes the most successful pipeline for eBooks, will other creators and distributors of other formats be able to compete? I can only assume the open access eBook realm will not fade away.

I found myself looking at the Sony eReader a week ago. The touchscreen and non-touchscreen versions boths have some different usability issues. The touchscreen is the better of the two, and supports annotation. It supports more files formats that the Kindle. It requires a PC has no wireless features. And it runs on MonteVista Linux, which a member of my family worked on a couple of years ago.

For now at least I plan to continue to read books on my Centro. I have about 3 dozen books, some recent, some classics. And I haven't divested myself of my nearly 3,000 dead tree books. Or my library cards.

Sunday, November 16, 2008

arl guide to the google book search settlement

The Association of Research Libraries has created "A Guide for the Perplexed: Libraries & the Google Library Project Settlement," a 23-page document intended to help libraries understand the impact of the proposed Google Book Search settlement.

google book search session at dlf

I was going to spend some time transforming my notes from Dan Clancy's session on Google Book Search from the DLF Fall 2008 Forum into more coherent prose, but for the sake of timeliness, I'm going to post them as is.

20% of the content in Google Book Search is in the public domain, 5% is in print, and the rest is in an unknown “twilight zone” -- unknown status and/or out-of-print.
7 million books scanned, over 1 million are public domain, 4-5 million are in snippet view.
Early scanning was not performed at an impressive rate, and it took way longer than expected to set up.
Priorities are working search quality, and exposure to google.com.
Search is definitely not solved and “done,” and is harder given the big distribution of relatively successful hits.
They are working to improve the quality of scanning and the algorithm to process the books and improve usability. They admit that they still have work to do, especially with the re-processing of older scans.
Data to support Long Tail model is right.
Creating open APIs, including one to determine the status of a book, and a syndicated viewer that can be embedded.
Trying to identify the status of orphans, and release a database of determinations. But institutions need to use determinations to guide their decisions, not just follow them because “Google said so.”
On the proposed settlement agreement: Google thought they would benefit users more to settle than to litigate.
The class is defined as anyone in the U.S. with a copyright interest in a book, in U.S. use. (no journals or music)
For all books in copyright, Google is allowed to scan, index, and provide varying access models dependent upon the status of the book -- if in print or out-of-print. Rights holders can opt out.
4 access models: consumer digital purchase (in the cloud, not downloads – downloads are not specifically included in agreement); free preview of up to 20% of book; institutional subscription for the entire database (site license with authentication, can be linked into course reserves and course management systems); public access terminals for public libraries or higher ed that do not want to subscribe (1 access point in each public library building, some # by FTE for high ed institutions) which allows printing (for 5 years or $3 million underwriting of payments to rights holders).
Books Rights Registry to record rights, handle payments to rights holders. It can operate on behalf of other content providers, not just Google.
Plan to open up government documents, because they feel that the rights registry organization will deal with the issue of possible in-copyright content included in gov docs, which kept them from opening gov docs before.
Admits that publishers and authors do not always agree if publishers have the rights for digital distribution of books. Some authors are adamant that they did not assign rights, some publishers are adamant that even if not explicit, it's allowed. The settlement supposedly allows sharing between authors and publishers to cover this.
What is “Non-consumptive research”? OCR application research. Image processing research. Textual analysis research. Search development research. Use of the corpus as a test corpus for technology research, not research using the content. 2 institutions will run data centers for access to the research corpus, with financial support from Google to set up the centers.
What about their selling books back to the libraries that contributed them via subscriptions? They will take the partnership and amount of scanning into account and provide a subsidy toward a subscription. Stanford and Michigan will likely be getting theirs free. Institutions can get a free limited set of their own books for the length of the copyright of the books. They can already do whatever they want with their public domain books.
They will not necessarily be collecting rights information/determinations from other projects for the registry. In building the registry, they are including licensed metadata (from libraries, OCLC, publishers, etc), so they cannot publicly share all the data that will make up the registry. But they will make public the status of book that are identified/claimed as in copyright.
If Google goes away or becomes “evil Google,” there is lots of language in contracts and settlement for an out.
The settlement is U.S. only because the class in the suit was U.S. only. Non-U.S. terms are really challenging because many countries have no concept of class-action, and there is a wide variation of laws.
A notice period begins January 5. Mid 2009 is the earliest time this could be approved by the court.

Tuesday, October 14, 2008

Frankfurt Book Fair survey on digitization

Via TeleRead, the 2008 Frankfurt Book Fair conducted a survey on how digitization will shape the future of publishing. The summary results are available in a press release.

These are the top four challenges facing the industry identified through the survey:

• copyright – 28 per cent
• digital rights management – 22 per cent
• standard format (such as epub) – 21 per cent
• retail price maintenance – 16 per cent

Not knowing what the details of these concerns really are in their survey results, as generalizations the first three are an interesting overlap with challenges facing digital collection building in libraries. What are appropriate terms for copyright and licensing for libraries? How do we identify/document copyright (and other rights) status? How do we manage access and provide for fair use with varying DRM scenarios? What standards will enhance preservation and ongoing access?

Wednesday, August 20, 2008

Registry of U.S. Government Publication Digitization Projects

I didn't know the Registry of U. S. Government Publication Digitization Projects existed:

"The Registry contains records for projects that include digitized copies of publications originating from the U.S. Government. It serves as a locator tool for publicly accessible collections of digitized U.S. Government publications; increases awareness of U.S. Government publication digitization projects that are planned, in progress, or completed; fosters collaboration for digitization projects; and provides models for future digitization projects."

The Registry has recently been updated, and they welcome additions. Institutions need to apply to contribute.

Wednesday, June 11, 2008

book publisher's manifesto

I've been reading Sara Lloyd's "Book Publisher's Manifesto for the 21st Century."

It's a very interesting essay. These sections stood out to me:

We will need to think much less about products and much more about content; we will need to think of ‘the book’ as a core or base structure but perhaps one with more porous edges than it has had before. We will need to work out how to position the book at the centre of a network rather than how to distribute it to the end of a chain. We will need to recognise that readers are also writers and opinion formers and that those operate online within and across networks. We will need to understand that parts of books reference parts of other books and that now the network of meaning can be woven together digitally in a very real way, between content published and hosted by entirely separate entities. Perhaps most radically, we will have to consider whether a primary focus on text is enough in a world of multimedia mash-ups. In other words, publishers will need to think entirely differently about the very nature of the book and, in parallel, about how to market and sell those ‘books’ in the context of a wired world. Crucially, we will need to work out how we can add value as publishers within a circular, networked environment.

and

Publishers need to provide the tools of interaction and communication around book content and to be active within the digital spaces in which readers can discuss and interact with their content. It will no doubt become standard for digital texts to provide messaging and commenting functions alongside the core text, to enable readers to connect with other readers of the same text and to open up a dialogue with them. Readers are already connecting with each other – through blogs, discussion forums, social book-marking sites, book cataloguing sites and wikis. Publishers need to be at the centre of these digital conversations, driving their development and providing the tools for readers to engage with the text and with each other if they are to remain relevant.

The idea that texts exist as networked content that can be broken down into components that can be recombined with other networked content in a multitude of contexts is a huge focus in digital humanities scholarship. Remixing and recontextualization through mashups isn't just a scholarly activity by any means. Anybody who has bought a song from iTunes and added it to a playlist has taken a single component from a larger whole that was once considered the only possible unit of distribution (an album) and recontextualized it (a personal thematic playlist).

Publishers are finally beginning to understand that the "book as unit" model is no longer the only model for distribution -- in fact, that will soon no longer be the dominant model for any media distribution.

That said, I hope publishers don't throw the baby out with the bathwater in the rush to identify new paradigms for digital distribution and reading on the screen. I still buy and read books. They are still a content unit with meaning. Publishers need to think about how they will continue to distribute books, but in a way that they can be consumed and retained as a whole OR broken down into components for consumption and re-use.

Another major topic in the manifesto is one that I have never given any conscious consideration -- do book buyers give any thought to publisher brands? Her answer is no, they do not. I sometimes take note of the publisher or line -- Vintage Crime for example -- because I have come to associate their line with titles that I have enjoyed in the past, so I'm more likely to look at one of their books on the shelf now and in the future. I have a lot of books from Tuttle and Kodansha because they publish Japanese fiction in translation. Anyone who has shopped at the New England Mobile Book Fair in Newton Highlands, Massachusetts, knows they arrange their stock by _publisher_, so you'd better have that noted in your WTB lists when you go there. But why else would anyone ever give any thought to the publisher?

My recognition of Vintage Crime, Kodansha, and Tuttle is proof that one of Ms. Lloyd's suggestions for publishers -- deep genre niche brands -- is not far off the mark. And she suggests that publishers need to market directly to their consumers rather than letting the next link in the distribution chain, e.g., booksellers, become the recognized brand through their marketing efforts. Publishers need to learn the basics of digital promotion in addition to digital distribution. I can think of a lot of book promotion experts I know: Bella Stander, Kevin Smokler, and M. J. Rose -- who would agree.

Tuesday, April 15, 2008

OR08 repository case studies

The Repositories Support Project has released two dozen UK, European, and North American digital repository case studies that were prepared for the Open Repositories 2008 conference. I thought it was a great idea when they solicited these for the conference, and I am very happy that my UVA case history is included.

Monday, March 10, 2008

HP BookPrep

Via ReadWriteWeb, I cam across HP BookPrep, a prototype print-on-demand service for, and this is really a quote, "every book ever published." The work from page images and process them through their own process into PDF "eMasters" for printing. It's an interesting prototype.

The pilot collection is Foodsville, a food and cooking community site. Members can read and purchase cookbooks at the site's free library, where books can be discovered by keyword, by author, or by browsing through tags. Of course, they're not actually "free" -- the print-on-demand costs ranged from $7 to $32 when I browsed through the 141 titles currently available. You don't have to buy the books -- you can read them online. I browsed through Lafcadio Hearn's La Cuisine Creole and found it readable. (I love the recipe title "Delicate Rusks for Convalescents" on page 235)

I'm a fan of Lafcadio Hearn, so I checked Amazon, and found that the same POD version is available for $10.26 versus $13.96 on Foodsville, both listed as marked down from $19.95 There is what looks to be a different POD version available from Amazon as well -- also a facsimile of the 1885 edition -- for $21.24. I'm not sure which I'd choose.

It does not say where they're getting the page images from. I'd really like to know that.

Friday, January 25, 2008

Cal Berkeley pilot to subsidize open access fees

Via OA Librarian, the U.C. Berkeley Research Impact Initiative (BRII) is launching an 18-month pilot program, to subsidize, in various degrees, fees charged to authors who select open access or paid access publication. The pilot will also yield data that can be used to gauge faculty interest in -- as well as the budgetary impacts of -- these new modes of scholarly communication on the Berkeley campus.

TheAtlantic.com

I was initially very excited by the announcement on BoingBoing that The Atlantic had opened its archive. I read the Editor's Note describing the decision. I followed the link to start my exploration.

It's a little misleading. The _site_ is now open to all. They have "Unbound" (web only) content and full issues back to 1995 open. But their other free content seems to be selected material. Some thing that I looked for are there (Vannevar Bush's "As We May Think"). I've seen other people's examples of known article with no results. There is still a "Premium Archive" for the full content going back to 1857.

Still, there's some great content. Try the search at http://www.theatlantic.com/a/search.mhtml

Monday, January 21, 2008

American Literatures

American Literatures is a Mellon Foundation-funded project where five university presses—NYU, Fordham, Rutgers, Temple, and Virginia—have established an initiative designed to create new opportunities for publication in humanistic scholarship. The most innovative aspect of the program will be the establishment of a shared, centralized, external editorial service dedicated solely to managing the production of books in the initiative. This service will handle all copyediting, design, layout, and typesetting costs, and manage each title through to the point where it is ready for printing. The initiative has a web site and has announced on its "about page" (scroll down) in which areas each press is soliciting submissions.

I look forward to watching how this progresses for the UVA Press.

Friday, January 18, 2008

smARThistory

There's an interesting project out of the Fashion Institute of Technology to develop an open access online art history text using interactive technologies -- smARThistory. It's being developed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

Thursday, January 10, 2008

print-on-demand from open access books

PublicDomainReprints.org is offering an experimental non-commercial service that allows users to convert digital public domain books in the Internet Archive, Google Book Search, or the Universal Digital Library to print-on-demand using the service Lulu.com. You pay for the service through Lulu.

For example, you paste in a URL for a public domain volume from Google Book Search, and the process takes advantage of the existing PDF for production. Apparently the Internet Archive PDFs can't be used, so the process takes advantage of the dejavu files instead.

There's a blog post mini-interview with the founder, who has his own blog.

Nature archive online

The full text of all Nature articles back to the first issue in 1869 are now online. I wish it weren't by subscription only, but we have access at UVA and it is remarkably cool. That first issue includes a book review for M. Madsen's Antiquités préhistoriques du Danemarck on Danish Iron Age sites that makes me want to look for a copy so I can see the beautifully described illustrations.

Wednesday, December 26, 2007

NIH Open Access mandate is law

Via Peter Suber, President Bush signed the omnibus spending bill that includes the requirement that publications based on NIH-funded research be submitted to PubMed Central and made publicly available, with a no more than 12-month embargo.

That said, there is no reason to wait to start depositing, articles should be deposited upon publication with restricted access for one year rather than waiting for deposit (Who keeps track of embargoes? -- "Today it's been a year and I should deposit that article"), and the same articles should additionally be self-archived in each researcher's university's own Institutional Repository. If their institution does not have an IR, they should ask "Why Not?"

EDIT at 6:27 PM: Here's the press release from the Alliance for Taxpayer Access.

Monday, December 17, 2007

Open Journal Systems 2.2 released

OJS 2.2 has been released. We have an earlier version up that we've been testing with a student journal. I see some additions that I know folks will find attractive. Strangely, one of the potentially most desirable features is the integration of Google Analytics. I have watched the editors of the journals that we host implement them because (sometimes struggling to do so) we're just not in the business anymore of supplying web stats for every site on our servers. I want to upgrade our test instance to the new version to see how it compares with other solutions that we are considering for the hosting of journals and the submission/review/editing process.

Thursday, December 13, 2007

sharing personal libraries

I am very excited by the news about the joint project between the Zotero group at the Center for History and New Media and the Internet Archive.

Zotero is a very easy-to-use tool for developing personal citation repositories for distributed resources. The creation of a "Zotero Commons" registry of sorts where materials used by researchers can be shared is a powerful idea. It's an institutional repository without institutional boundaries. The idea of tying this in to the Internet Archive's archive of the web so that materials citied but not directly deposited are also not lost is even more intriguing. That there will be the capacity for both individual and group work is as it should be.

Here's the core of the project to me: "The combined digital collections present opportunities for scholars to find primary research materials, to discover one another’s work, to identify materials that are already available in digital form and therefore do not need to be located and scanned, to find other scholars with similar interests and to share their own insights broadly."

I wonder how this will fit into the landscape with other digital registries and collections. The DLF/OCLC Registry? OAIster? Aquifer? American Memory? What is the relationship between what institutions digitize, what their research communities have deposited in IRs, what is harvested into larger aggregations, and what scholars personally create? This is a problem space that bears a lot more discussion.

Monday, December 10, 2007

print and digital at the New York Times

One of the most insightful blogs on media culture is written by David Byrne. He recently blogged about the new New York Times building, and, in response to his posting, he was invited to meet with various NY Times staff about print and digital journalism. His post on those meetings is one to read.

Thursday, December 06, 2007

supporting data in PQDT

Today we received a downtime notice from ProQuest. This is not unusual, especially during the holidays when services make time to update and upgrade. What cheered my little heart was this notice:

ProQuest Dissertations & Theses (PQDT) Multimedia Support release—ProQuest has seen an increase in dissertations and theses that include supplementary digital materials - audio, video, spreadsheets, etc. To properly support scholarly access to these materials, ProQuest is now making them available online in the Full Text version of the ProQuest Dissertations & Theses (PQDT) database.

Yes, ProQuest is going to include at least some supporting media and data for the theses and dissertations that it makes available. PQDT already has an Open PQDT that includes Open Access ETDs. I cannot find any details about this on the ProQuest site, but it's a promising step.