Home > ResourceBlog > Article

« All ResourceBlog Articles


Bookmark and Share   Feed

Wednesday, 18th October 2006

Cornell Joins Microsoft Book Scanning Project and Other Scanning News And Tools

Let's digitize some books.

That's great/amazing in theory (and don't forget many libraries and archives have their own projects) but you have to wonder how much duplication is going on or will take place. Put another way, how much time is being wasted scanning the same material both as separate projects (same book from OCA and GBS) and internally (same item from various libraries)? But that's business I guess. Peter Suber from Open Access News also mentions this same topic.

Yesterday, Cornell University signed a deal to be part of the Microsoft/OCA program.

Today, Microsoft also announced they have licensed high-speed scanning technologies from Kirtas for its scanning program which is part of the Open Content Alliance (OCA)..The works scanned by Kirtas will become available via Windows Live Book Search starting in early 2007. Cornell librarians will have a hand in choosing which versions of books to scan and overseeing quality control of the digitization process, according to Cornell.

Schools like the University of California are part of both. Last year the University of California system announced they would be part of Open Content Alliance (members include Yahoo and Microsoft) digitization program.

Then, a couple of months ago, the UC System announced they were also also joining the Google Book Search program.

From Canadace Lombardi's article today:

The project, when complete, will make public domain works, as well as copyright material from publishers who opt-in, freely available through Microsoft's online Web application.


What about Yahoo? In fact, their blog was the first place where the OCA was announced on October 2, 2005

Yahoo will index the content and is also funding the digitization of an initial corpus of American literature collection that the University of California system is selecting, Adobe and HP are helping with the processing software, University of Toronto and O'Reilly are adding books, Prelinger Archives and the National Archives of the UK are adding movies, etc. We hope to add more institutions and fine tune the principles of working together.


Microsoft announced its involvement with the Open Content Alliance and Microsoft Book Search a few weeks later.

From SEW Blog, October 26, 2005

According to [Danielle] Tiedt, Microsoft has currently committed to fund the scanning of 150K books. In the case of these books (public domain content), Microsoft is making deals on their own with libraries (we don't know which ones) who will provide the content. Then, some (but not all of this material, depending on the library and the actual content) will be available as part of the OCA database. Every library that provides a copy of the book for scanning will also recieve a file for local use.


Other organizations and schools that are part of the OCA include:
* European Archive
* Internet Archive
* National Archives (UK)
* O'Reilly Media
* Prelinger Archives
* University of California [now also part of Google Book Search]
* University of Toronto

As we pointed out in this post from earlier week, a good portion (we can't get actual numbers) of content in Google Book Search so far comes as limited preview material direct from the publisher. This is very similar if not exactly what Amazon.com offers with Search Inside the Book. An Amazon.com/OCA hook-up would be very powerful. Let's also not forget that access doesn't guarantee retrievability, especially when it comes to a subject search in a massive database coupled with the poor searching skills many have.

So, that's the story. Confusing? Of course. We also don't think the masses understand (though Google has tried hard to explain) the differences between various types of scans. We hear from people all of the time thinking that once the project is complete ALL BOOKS (new, old, or in between) will be available from their computer for free. They seem to miss out on the snippet part of the story. In terms of what Google calls limited view books, Amazon.com is also doing a great job with Search Inside the Book.

Btw, say you're online today and want to look at eBooks. Here are a few places to review:

+ World eBook Fair
Free access all of this month. More than 500,000 full text books all in PDF. Rest of the year, $8.95.

+ International Childrens Digital Book Library
Full text books in many languages and a very cool search interface.

+ ebrary
Free remote access to more than 20,000 books. All full text and full image. Pay only to print or copy a page. About 25 cents. No limit on how much you can view.

+ NetLibrary
Available free from many libraries. Full text, no limit on how much you can view. Remote access with a library card.

+ The Online Books Page
More than 25,000 full text books from various sources. All free. All public domain material.

+ The OpenBook Library
Cool technology. Reminds me of the "Turning the Pages" technology see here (NLM) and the British Library (12 full text books).

+ DigitalBookIndex
Over 128,000 titles. Some fee, some free. 128,000 titles about 88,000 free.

+ eBook Locator

See Also: Let’s Scan: The First Contribution from Univ. of Pittsburgh to Open Content Alliance

See Also: Microsoft to offer book search (10/2005)

See Also: A video of Book Scanning Robot at the University of Toronto in action.

See Also: An article about U of Toronto Scanning: Building an Online Library, One Volume at a Time (via WSJ, free)


Views: 1079

« All ResourceBlog Articles



FreePint supports the value of information in the enterprise. Read more »

FeedLatest FreePint Content:

  • Click to view the article Changing Compliance Culture in the Financial Sector
    Thursday, 24th July 2014

    In a speech at the annual Thomson Reuters Compliance & Risk Summit Tracey McDermott of the UK Financial Conduct Authority spoke about the change in focus of the regulator towards corporate culture and the enforcement of personal liability by management. The shift from rules-based regulation to the harder to measure compliance culture may require information managers to review the compliance information in their organisation and who receives it.

  • Click to view the article Product Review of Reg-Track (Sources - Content & Coverage)
    Thursday, 24th July 2014

    In the second part of his review, Chris Porter looks at content coverage in Reg-Track, a regulatory tracking service aimed at compliance professionals focused on the financial services industry. With a focus on the largest financial markets, such as those within North America, the European Union and Asia-Pacific, Reg-Track is also expanding its coverage to additional regulators.

  • Click to view the article Are We Becoming Too Scared of Security to Benefit from Big Data?
    Thursday, 24th July 2014

    Information assets are in the hands of the right people to safeguard them but the wrong people to manage their exploitation concludes an influential new study. As reports continue to emerge of apparent corporate paralysis in the face of cyber security threats, Tim Buckley Owen wonders whether the emphasis hasn't swung too far in that direction, and sees opportunities for information professionals in helping to strike a balance between security in the cloud and effective data analysis for competitive advantage.

  • ... more ...

All FreePint Content »
FreePint Topics »

A FreePint Subscription delivers articles and reports that support your organisation's information practice, content and strategy.

Find out more and order a FreePint Subscription by visiting the
completing our online form: Subscription Order page.

FreePint Testimonials

"It was really useful to get so much input from customers and hear their perspective - I have come into the office this morning full of things ..."

Read more testimonials and supply yours »




Register to receive the free ResourceShelf Newsletter, featuring highlighted posts.

Find out more »

Article Categories

All Article Categories »


All Archives »