The German Digital Library wants to make millions of books, films, images and audio recordings accessible online. More than 30,000 libraries, museums and archives are expected to contribute their digitized cultural artifacts. The idea, in part, is to compete with Google Books. But will it work?
On a good day this reader gets through as many as 1,216 pages per hour. Hissing quietly, devouring book after book. Now and then it says, "Pffft."
This is a state-of-the-art robot at work. It automatically scans every book placed open in front of it. A slender wedge drops down to the fold, sucks in a page from left and right and lifts the goods. It's photographed and with a gentle puff of air -- pffft -- the robot flips the page.
So it goes, day after day, at the Munich Digitization Center of the Bavarian State Library. Some 45,000 works have been scanned -- from the "Nibelungenlied" on parchment to an original score from the hand of Gustav Mahler.
[Snip]
The first trial version [of the German Digital Library (Deutsche Digitale Bibliothek, or DDB)] may go online in 2011 -- "and that will only be for a restricted group of users," says Ute Schwens, a director of the German National Library in Frankfurt, which is coordinating the DDB.
[Snip]
Rolf Griebel, Director General of the Bavarian State Library, who regards the project as "good and overdue," nevertheless warns against over-ambitious plans. "I have real doubts about whether the DDB can be filled with content properly and within a reasonable timeframe," he says.
Griebel estimates that scanning a book from the 16th or 17th century costs between €70 and €140, depending on the amount of work. Contemporary titles are cheaper, but the quantities involved are enormous. The German Library Association is proposing to digitize around 5.5 million volumes in the first 10 years. That would cost at least €165 million. But where is the money supposed to come from?
[Snip]
And the search technology will be more sophisticated than just looking up terms, as offered by Google. The DDB collections (under the current plan) will be indexed according to a range of criteria -- place, time, subject area. Such an index can only work if the objects are described in detail.
In this effort the DDB has the benefit of some basic technology from the German government-funded Theseus program. Researchers at Theseus have been working since 2007 on methods of indexing images, films, audio recordings and books. If the computer has a rudimentary understanding of what's going on, it can fill out several fields automatically -- indispensable for the vast quantities of documents the DDB will have to contend with.