OK, let's see if we can try to make this story clear, one step at a time. This post will focus on content. We can save other issues for future posts. With so much scanning going on, it can be very easy to get confused. Bottom Line: Book scanning involves many more projects than the ones that get a lot of the attention. BTW, the NY Times ran another story about digitization projects in March.
(An example of a topical collection from the Open Content Alliance -- Illinois Harvest. Includes books about Chicago, Abraham Lincoln, and many other topics.)
1) In this post, we're talking about digitizing books (both in and out of copyright) that are found in library collections. We're NOT talking about material made available from publishers directly to Google Book Search (Google Book Search Partner Program) and Amazon's Search Inside the Book databases. We have found that this difference can confuse people.
2) Some libraries are working with Google/Microsoft/Open Content Alliance.
In fact, both Cornell and the University of California Libraries have announced they will work with both projects. However, when you look at the number of libraries (and don't forget about archives, museums, etc.) in the world, it's really only a small number. It's sad to see that what's likely happening is that money (not a major issue, in this case) and TIME (a key issue) likely mean that the same titles are being scanned multiple times. We could all think of other uses for the dollars going to digitize the same title more than once.
The article also points out, both MSFT and Yahoo are members of the Open Content Alliance, and it discusses the pluses and minuses of each program. Here's how we covered it almost two years ago. Then, as today's article notes:
A year after joining, Microsoft added a restriction that prohibits a book it has digitized from being included in commercial search engines other than Microsoft’s.
3) Book digitization is NOT NEW. It's difficult to believe that the NY Times article makes NO mention of Project Gutenberg, which has been digitizing books for over 36 years. That's right, 36 years! BTW, Project Gutenberg Canada launched a few months ago.
4) Keep in mind that access and organization are two different things here. We also know that search habits (for many) will have people searching for phrases like "Dallas Cowboys" or "London Underground" or "New York City Fire Department." We know that most searchers will not use quotation marks to search the words as a phrase. That means millions and millions of hits. This is an excellent example of what constitutes a good part of the invisible or deep web in 2007. True, Universal Search, Onesearch, 3D search, etc., can help but that's another story.
5) Other issues for other ResourceShelf posts include:
A) Book digitization from companies like:
++ ebrary. (ebrary Discover offers more than 20,000 full text books for free. Pay only to copy or print a page.)
++ NetLibrary, available free from many public libraries -- which just passed the 150,000 book milestone
++ Books 24x7
++ Safari Tech Books O'Reilly and Pearson
B) Quality of the scanning and how it appears on the web.
C) The issue of whether people really want to read books on a computer screen -- be it a large monitor or on an iPhone or Treo?
5) Let's review some projects, services, and where to find digitized books:
+ Online Books Page
Thousands and thousands of FREE, full text books from many sources. If you browse the "What's New" page, you'll see links to freely available full text books -- both old and new -- being digitized by organizations like:
+ American Historical Association
+ John F. Kennedy Library
+ The Online Library of Liberty
+ LibraryIreland
+ Rice University Press
+ Internet Sacred Text Archive
+ Doctortee.net
+ University of Virginia Digital Collections
+ Making of America (U of Michigan), Over 12,000 Volumes
+ Illinois Institute of Technology
And these are just the tip of the iceberg.
In other words, many organizations and LIBRARIES, are digitizing books.
Publishers Get in the Act: The National Academies Press Offers Thousands of Full Text Books at No Charge to Search/Read (Unlimited Amount) at No Charge.
A family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success. Read more »
Recently I have found myself cooing over visualisation maps (and heat maps) of health and well being resources. The content rich data is overlayed with mapping technologies, and some interesting themes and patterns are emerging.
A lot of the talk around social media in the last year has been around information overload. Social media has provided us with new and exciting ways to create content. But it has also meant learning new ways to manage and engage with social media tools. Are we teetering on the edge of an information overload precipice?
Information overload is a figment of your imagination. Or a failure of your filter. Or a symptom of your technological submissiveness. Depends on who you ask.
What if you had to sort through 3.5 million articles and social media posts a day and try to pull out the most relevant items for your organisation? What if you then had to cobble it all together into something readable for your top groups and executives in your organisation?
Alacra Compliance saves time by aggregating information from both free and fee-based sources and enabling users to conduct an accurate federated search across these sources (coined “simultaneous search” by Alacra).