Wednesday, 18th October 2006
Cornell Joins Microsoft Book Scanning Project and Other Scanning News And Tools
Let's digitize some books.
That's great/amazing in theory (and don't forget many libraries and archives have their own projects) but you have to wonder how much duplication is going on or will take place. Put another way, how much time is being wasted scanning the same material both as separate projects (same book from OCA and GBS) and internally (same item from various libraries)? But that's business I guess. Peter Suber from Open Access News also mentions this same topic.
Yesterday, Cornell University signed a deal to be part of the Microsoft/OCA program.
Today, Microsoft also announced they have licensed high-speed scanning technologies from Kirtas for its scanning program which is part of the Open Content Alliance (OCA)..The works scanned by Kirtas will become available via Windows Live Book Search starting in early 2007. Cornell librarians will have a hand in choosing which versions of books to scan and overseeing quality control of the digitization process, according to Cornell.
Schools like the University of California are part of both. Last year the University of California system announced they would be part of Open Content Alliance (members include Yahoo and Microsoft) digitization program.
Then, a couple of months ago, the UC System announced they were also also joining the Google Book Search program.
From Canadace Lombardi's article today:
The project, when complete, will make public domain works, as well as copyright material from publishers who opt-in, freely available through Microsoft's online Web application.
What about Yahoo? In fact, their blog was the first place where the OCA was announced on October 2, 2005
Yahoo will index the content and is also funding the digitization of an initial corpus of American literature collection that the University of California system is selecting, Adobe and HP are helping with the processing software, University of Toronto and O'Reilly are adding books, Prelinger Archives and the National Archives of the UK are adding movies, etc. We hope to add more institutions and fine tune the principles of working together.
Microsoft announced its involvement with the Open Content Alliance and Microsoft Book Search a few weeks later.
From SEW Blog, October 26, 2005
According to [Danielle] Tiedt, Microsoft has currently committed to fund the scanning of 150K books. In the case of these books (public domain content), Microsoft is making deals on their own with libraries (we don't know which ones) who will provide the content. Then, some (but not all of this material, depending on the library and the actual content) will be available as part of the OCA database. Every library that provides a copy of the book for scanning will also recieve a file for local use.
Other organizations and schools that are part of the OCA include:
* European Archive
* Internet Archive
* National Archives (UK)
* O'Reilly Media
* Prelinger Archives
* University of California [now also part of Google Book Search]
* University of Toronto
As we pointed out in this post from earlier week, a good portion (we can't get actual numbers) of content in Google Book Search so far comes as limited preview material direct from the publisher. This is very similar if not exactly what Amazon.com offers with Search Inside the Book. An Amazon.com/OCA hook-up would be very powerful. Let's also not forget that access doesn't guarantee retrievability, especially when it comes to a subject search in a massive database coupled with the poor searching skills many have.
So, that's the story. Confusing? Of course. We also don't think the masses understand (though Google has tried hard to explain) the differences between various types of scans. We hear from people all of the time thinking that once the project is complete ALL BOOKS (new, old, or in between) will be available from their computer for free. They seem to miss out on the snippet part of the story. In terms of what Google calls limited view books, Amazon.com is also doing a great job with Search Inside the Book.
Btw, say you're online today and want to look at eBooks. Here are a few places to review:
+ World eBook Fair
Free access all of this month. More than 500,000 full text books all in PDF. Rest of the year, $8.95.
+ International Childrens Digital Book Library
Full text books in many languages and a very cool search interface.
Free remote access to more than 20,000 books. All full text and full image. Pay only to print or copy a page. About 25 cents. No limit on how much you can view.
Available free from many libraries. Full text, no limit on how much you can view. Remote access with a library card.
+ The Online Books Page
More than 25,000 full text books from various sources. All free. All public domain material.
+ The OpenBook Library
Cool technology. Reminds me of the "Turning the Pages" technology see here (NLM) and the British Library (12 full text books).
Over 128,000 titles. Some fee, some free. 128,000 titles about 88,000 free.
+ eBook Locator
See Also: Let’s Scan: The First Contribution from Univ. of Pittsburgh to Open Content Alliance
See Also: Microsoft to offer book search (10/2005)
See Also: A video of Book Scanning Robot at the University of Toronto in action.
See Also: An article about U of Toronto Scanning: Building an Online Library, One Volume at a Time (via WSJ, free)