Wednesday, 6th December 2006
History and Overview: Microsoft Live Book Search (Beta) Now Online; Medical Content Being Added to MS Live Academic
In the Fall of 2005, Microsoft announced that it would release a book search service and, at the same time, joined the Open Content Alliance (OCA).
From the October 2005 announcement:
MSN will first make available books that are in the public domain and is working with the Internet Archive to digitize the material. MSN will then work to extend its offering to other types of offline content. The digitized content will primarily be print material that has not been copyrighted, and Microsoft will clearly respect all copyrights and work with each partner providing the information to work out mutually agreeable protections for copyrights.
More on the October 2005 announcement about Microsoft Book Search in a fact-filled article by Barbara Quint.
In November 2005, the British Library said it was joining the Microsoft project. Then, in June 2006, a Microsoft press release indicated that some books from the University of California and the University of Toronto would be digitized and included in the database. Since then, the University of California announced that it was also going to take part in Google's Library Program -- which is a part of Google Book Search.
Today, about 14 months after MS announced its intentions, Microsoft Live Book Search is online (beta release) at http://books.live.com. Here's the Microsoft Live Search Blog announcement. This is called the U.S. release. So it's likely that the service will gradually roll out globally with the index also adding material in languages other than English. A search with non-English words (here's an example with Spanish words) found only a few books, and those were English language titles that contained the terms.
Of course, look for comparisons with Google Book Search (GBS, which includes Google's Library Program) and Amazon.com's Search Inside the Book (SITB). Competition? Perhaps, for some. However, as we often say on ResourceShelf, the more options and tools information professionals and end users have, the better.Google's CEO, Eric Schmidt, has said that search is NOT a zero-sum game.
For commentary about these services, ResourceShelf recently posted news (new features at GBS) along with complete reviews by Jacso and Mick O'Leary. Greg Notess also has an in-depth look at recent changes at GBS.
Don't forget that many other digitization programs are out there. Heck, Project Gutenberg has been around for 35 years. In this post (at the bottom) we list a few of many projects like The World eBook Library (about 500,000 titles, all in PDF), and the amazing Online Books Page, which is really a well maintained and constantly expanding directory. It even offers an RSS feed of new titles. We also list commercial projects such as ebrary and NetLibrary.
OK, now let's talk about Microsoft Live Book Search (MLBS).
Fast Facts via a News.com Story:
+ While in beta, accessible at books.live.com and via a link on Windows Live (click the more button) at the top of the page.
+ After the beta period, likely 6 months, book content will be added to the main web search index. Interesting.
From the article:
As we move out of beta, what you will see is that book content integrated with the Web content (search results on Windows Live Search). What we are focusing more of our efforts on for live searching is integrating all of those content types together to give you the most relevant results. Sometimes the most relevant will be from books. If, for example, it's a search on historical content, chances are the most authoritative content may be found in a book," said [Danielle] Tiedt, [the general manager of Live Search Selection for Microsoft].
Note: Right now, Google Book Search lists potential books of interest if the query terms trigger a OneBox at the top of the page. The sure fire way to trigger this is by using the words "book" or "books" in the query. Here's an example.
+ Where do books in the MLBS beta release come from?
According to the article, material in this beta includes "noncopyright" books from the collections of the British Library, the University of California and the University of Toronto. Books from the NY Public Library (also a Google Library), Cornell, and the American Museum of Veterinary Medicine will be added in the next month.
+ These books and other public domain books are available to download in PDF.
+ Danielle Tidedt On Copyright:
"We feel very strongly about copyright. All the library scanning we do is noncopyright stuff, and then we work with publishers to produce (copyright) stuff. We don't do any mass scanning of in-copyright works," said Tiedt.
The Search Experience at Live Book Search
+ Just a search box and some text stating:
Find a book, or search within a book. Enter keywords to begin.
+ Help link is located in lower left of page.
+ Results pages provide continuous scrolling as seen in other "Live: products. In other words, as you scroll, more results are loaded. No need to reload the page.
+ Results lists include: title, publication year, author, number of pages, search terms bolded. It appears that common stop words like "the," "a," "I" and other traditional stop words are considered search terms if combined with a non-stop word in the query. For some books, subjects are listed (looks like LC) with each title in the result list. However, these are not seen when looking at book info in the book page view. In some cases, you'll also see that they just trail off. Note the third and fourth results here. As of today, these headings are also not hyperlinked. In some cases, no result count or estimate is given (perhaps not a bad thing) when the result count is large. Then, you'll see something like "1-5 of Thousands."
+ An actual entry page for a specific book is shown in two windows. The left window shows title info, author, publisher, publication year, a box to search within that specific book, a link to download the book as a PDF file, keywords in context, and links to specific pages where the words appear in the book. The right window contains the images of pages from the digitized book, navigation tools (forward, back), and two buttons that will either reduce or increase the size of the scanned image in the window. We were unable to find a way to jump to a specific page. We were also unable to find links that go directly to key pages like the TOC, index, title page, etc. This would be useful, as would various types of advanced search options listed on the page and documented in the syntax. MLBS pages currently contain no paid search results.
+ Scanning looks nice from what we've seen. Each page also states, "Digitized by Microsoft." Like other services, books in PDF format are offered one page at a time. For example, jumping from one chapter to another can only be accomplished by first looking at the TOC and then jumping to the specific page.
Since You're Probably Wondering...
Duplication with Google Book Search at least with this first beta? We searched for ten random titles. We alternated our starting point between GSB and MLBS. Note: At Google, we limited to searches to "full view" books. What's a full view book?
+ The Cultivated Man (1915)
Available in both MLBs and GBS
+ The Book of Athletics (1914)
+ New York Panorama (1938)
+ A Manual of Nursing (1894)
+A Historical Companion to Hymns Ancient and Modern (1903)
+ Education and the Philosophical Ideal (1900)
+ A Forest Orchid and Other Stories (1902)
+ Laws and Ordinances Governing the City of Chicago (1873)
+ The Stolen Story and Other Newspaper Stories (1899)
MLBS and GBS
+ A Civil Service Manual (1912)
Surprising results? Hardly. With the millions and millions of books out there, seeing only two duplicate titles (again, we limited to GBS full view) is no surprise at this point. It will be interesting to watch over time as more and more public domain books are digitized and made available in both databases.
+++ In other Microsoft News...
Microsoft is adding medical content to its Live Academic Search tool. It's been months since we've heard or read anything about Live Academic. Actually, we've heard almost nothing since it launched (beta) in April. A few weeks later, Jacso gave it one of his thorough reviews. According to Tiedt, today's release of medical journal content will "practically quadruple" the amount of content available. As of this hour (1pm), the list of journals indexed has yet to be updated. Here's a cached version of the page for reference. It looks like the page hasn't been touched (better organization, dupe removal, etc.) since it first went online in April.
In an article about the Open Content Alliance on News.com, Brewster Kahle founder of the OCA and Internet Archive said:
Microsoft was an early supporter of the OCA and in June worked with it on a project scanning and indexing materials from the University of California and the University of Toronto libraries as part of its Windows Live Book Search project. But Microsoft has become more proprietary in recent months, Kahle said.
"We continue to work with Microsoft, but the results going forward are not strictly OCA principles," Kahle later added in an e-mail. "To their credit, they are interested in helping get more scanning done in the open, of course because they can use the books as well, but still, this is more than other projects.
1) Nice to have MLBS online. More content, cool! However, it's need of more features like direct links to key portions of a book and perhaps hyperlinked subject headings. Advanced search, too! Yes, this is day one of a beta so let's give it some time.
2) It's always good to have more than one search tool available. :-) You've heard that before. It would be a good thing if many of the large digitization projects could work together to avoid scanning the same book many times (save time, money, effort, resources, get more done in a shorter amount of time). But, that's highly unlikely. This is business and that's the way it goes. Actually, since Microsoft is part of the Open Content Alliance, these scanned public domain titles that MS has digitized should be available (in a package vs. downloading each title, you could do this with both GBS and MLBS) to other OCA members like Yahoo, OCLC/RLG (NetLibrary), etc. The OCA FAQ says that metadata will be available for harvesting using OAI and RSS. If or when they will use them is something else to watch out for. Also, will the OCA release their own search interface perhaps on the Internet Archive site?
3) It's taken Microsoft seven months to do something new with Live Scholar. Will it be another seven months for more? Will this be the case for MLBS?
4) One service that would be useful from ALL book digitization projects would be a regularly updated list (perhaps a feed) of new titles as they are added to the various databases.