Monday, 22nd October 2007
NY Times, Book Scanning, and Lots of Resources
OK, let's see if we can try to make this story clear, one step at a time. This post will focus on content. We can save other issues for future posts. With so much scanning going on, it can be very easy to get confused. Bottom Line: Book scanning involves many more projects than the ones that get a lot of the attention. BTW, the NY Times ran another story about digitization projects in March.
The Story: Libraries Shun Deals to Place Books on Web
The focus is on Google Book Search, Live Book Search from Microsoft (when was the last time you visited that service?), and the Open Content Alliance.
(An example of a topical collection from the Open Content Alliance -- Illinois Harvest. Includes books about Chicago, Abraham Lincoln, and many other topics.)
+++ List of Open Content Alliance Contributors
+++ Google Book Search Library Partners
1) In this post, we're talking about digitizing books (both in and out of copyright) that are found in library collections. We're NOT talking about material made available from publishers directly to Google Book Search (Google Book Search Partner Program) and Amazon's Search Inside the Book databases. We have found that this difference can confuse people.
2) Some libraries are working with Google/Microsoft/Open Content Alliance.
In fact, both Cornell and the University of California Libraries have announced they will work with both projects. However, when you look at the number of libraries (and don't forget about archives, museums, etc.) in the world, it's really only a small number. It's sad to see that what's likely happening is that money (not a major issue, in this case) and TIME (a key issue) likely mean that the same titles are being scanned multiple times. We could all think of other uses for the dollars going to digitize the same title more than once.
The article also points out, both MSFT and Yahoo are members of the Open Content Alliance, and it discusses the pluses and minuses of each program. Here's how we covered it almost two years ago. Then, as today's article notes:
A year after joining, Microsoft added a restriction that prohibits a book it has digitized from being included in commercial search engines other than Microsoft’s.
3) Book digitization is NOT NEW. It's difficult to believe that the NY Times article makes NO mention of Project Gutenberg, which has been digitizing books for over 36 years. That's right, 36 years! BTW, Project Gutenberg Canada launched a few months ago.
4) Keep in mind that access and organization are two different things here. We also know that search habits (for many) will have people searching for phrases like "Dallas Cowboys" or "London Underground" or "New York City Fire Department." We know that most searchers will not use quotation marks to search the words as a phrase. That means millions and millions of hits. This is an excellent example of what constitutes a good part of the invisible or deep web in 2007. True, Universal Search, Onesearch, 3D search, etc., can help but that's another story.
5) Other issues for other ResourceShelf posts include:
A) Book digitization from companies like:
++ ebrary. (ebrary Discover offers more than 20,000 full text books for free. Pay only to copy or print a page.)
++ NetLibrary, available free from many public libraries -- which just passed the 150,000 book milestone
++ Books 24x7
++ Safari Tech Books O'Reilly and Pearson
B) Quality of the scanning and how it appears on the web.
C) The issue of whether people really want to read books on a computer screen -- be it a large monitor or on an iPhone or Treo?
5) Let's review some projects, services, and where to find digitized books:
+ Online Books Page
Thousands and thousands of FREE, full text books from many sources. If you browse the "What's New" page, you'll see links to freely available full text books -- both old and new -- being digitized by organizations like:
+ American Historical Association
+ John F. Kennedy Library
+ The Online Library of Liberty
+ Rice University Press
+ Internet Sacred Text Archive
+ University of Virginia Digital Collections
+ Making of America (U of Michigan), Over 12,000 Volumes
+ Illinois Institute of Technology
And these are just the tip of the iceberg.
In other words, many organizations and LIBRARIES, are digitizing books.
Info pros should know about a variety of sources. Here are a few more:
+ International Children's Digital Library
Both old and new books. Free Access. Fun for all!!!
+ Digital Book Index
130,000 titles listed, over 100,000 free. Also note the list of organizations providing content in the right rail.
+ World Public Library
Over 500,000 titles, searchable, available for a very small yearly fee.
+ Internet Archive--Texts
Comprises several projects and has the same leadership as the Open Content Alliance. Also, many titles are available in several formats, from simple text to HTML to PDF.
+ UK: Full text books and cool technology from the Turning the Pages service at The British Library.
+ UK: British Library books go digital
+ Shakespeare Full Text and Full Image on the Web
Some gorgeous work.
Want More? Projects from Around the Globe? Dave Mattison's British Columbia International Digital Library is the place to begin.
Start browsing here and here. Wow!!!
Publishers Get in the Act: The National Academies Press Offers Thousands of Full Text Books at No Charge to Search/Read (Unlimited Amount) at No Charge.
See Also: Bradley on Changes at Google Book Search: Google Book Search Improved(?) (via SEL)
See Also: an article about U of Toronto Scanning: Building an Online Library, One Volume at a Time (via WSJ, free)
See Also: 2004 Video of Book Scanning Robot at University of Toronto