Sunday, March 11, 2007

Digitizing History

The Business section of today's New York Times has a front-page article by Katie Hafner entitled "History, Digitized (and Abridged)." The article asserts that materials that are not digitized will very likely be completely forgotten by future generations. The historical record will be the poorer for the lack of these materials.

Hafner refers to the many digitization projects that have been undertaken over the last ten years, citing figures that are truly eye opening. For instance, even the Library of Congress will digitize "perhaps only 10 percent of the 132 million objects held...in the foreseeable future." The situation is the same at the National Archives and at many other smaller institutions around the country. The main problem is funding.

Hafner cites the costs of scanning which are much higher than I thought: "$6 to $9 for a 35-millimeter slide...$7 to $11 a page for presidential papers...$12 to $25 for poster-size pieces." Hafner also points out that the "cost of scanning an object can be a relatively minor part of the entire expense of digitizing and making an item accessible online.)" The scanning projects my library has undertaken so far have been pretty straightforward--8 1/2" x 11" format, black type on white paper, virtually no images--and it would be impossible to estimate the costs we have incurred. Even so, the projects are very labor intensive.

Public institutions are looking to private benefactors to help underwrite the costs of digitization. Google, for instance, has donated $3,000,000 "to help start an effort led by the Library of Congress that will digitize and share materials around the globe, and has also provided technical resources" to LC. In addition, Google is digitizing books at LC. Some of the other benefactors are Reuters, IBM, and the Andrew W. Mellon Foundation.

An additional constraint for institutions is copyright, which is not a concern when digitizing old materials, but does affect modern items. The problem is particularly acute with recorded music, where there are "a series of gaps in the popular understanding of the nation's musical heritage." Finally, the items that tend to get digitized are those things that are easy to handle, i.e., printed texts that don't require special handling. Benefactors want to see results for their money--the easier the collection is to digitize, the easier it will be to find grant money to fund the project. Even Google focuses most of its digitization efforts on printed books. As with recorded music, this reluctance to initiate digitization of hard-to-handle materials will result in gaps in the historical record.

The risks of not digitizing our entire historical record are real. According to James J. Hastings, director of access programs at the National Archives, "If researchers conclude that the only valuable records they need are those that are online they will be missing major parts of the story...And in some cases they will miss the story altogether."

1 comment:

Betsy McKenzie said...

Dear Marie,
Thank you for a fascinating post! I agree both that items not digitized are becoming lost (already!) and that scholars who only look at digital materials are missing a vital part of the record. This will certainly be an increasing trend, though. We already see this in our students, and sometimes in scholars who should know better. I have an article comparing KeyCite and Shepards, published in Reference Services Quarterly, which is not available online. That article is overlooked very frequently when people look for publicaitons comparing the two services. This shows how materials that don't come up in an online search are already becoming lost literature. I think that even digital materials that don't easily pop up with a common-vocabular search will be lost.