Web Search
Source: The New York Times The Coming Search Wars
John Markoff reports on Microsoft's move into the web search world and what Google is working on. The article also mentions Yahoo. Key passages from the article:
+ "Bill Gates, the chairman of Microsoft, stated his admiration for the "high level of I.Q." of Google's designers. "We took an approach that I now realize was wrong,'' he said of his company's earlier decision to ignore the search market. But, he added pointedly, "we will catch them.''
+ "Google has been quietly developing what industry experts consider to be the world's largest computing facility. Last spring, Google had more than 50,000 computers distributed in over a dozen computer centers around the world. The number topped 100,000 by Thanksgiving, according to a person who has detailed knowledge of the Google computing data center. The company is placing a significant bet that Microsoft will be hard pressed to match its response time to the ever increasing torrent of search requests."
+ "Google has embarked on an ambitious secret effort known as Project Ocean, according to a person involved with the operation. With the cooperation of Stanford University, the company now plans to digitize the entire collection of the vast Stanford Library published before 1923, which is no longer limited by copyright restrictions. The project could add millions of digitized books that would be available exclusively via Google." ResourceShelf has heard of other efforts like Project Ocean.
A Couple of Comments About Google
I'll admit it, I'm tough on Google. However, as market leader and THE tool that's equal to research for many people, it needs to be watched closely by those of us in the info profession. Google is a fine product and has done good things for web search. However, it's not THE solution.
--
1) Google needs to fix several advanced search problems. These are things that should work. For example, having OR work properly. Greg Notess points out many of them. An advanced feature that would be nice to have is an option to turn Google's new autostemming feature on or off.
2) Google's page estimates haven't been close to accurate for many months. I've been told that they're "just estimates." However, can't estimates can be somewhat accurate? Many people use these numbers (not a good idea) as a way of determining the popularity of whatever they're searching.
3) Google's advanced search documentation should state that longer html web pages (over 101kb) and pdf files (over 120kb) are indexed only up to those points. If what you need to find is beyond that arbitrary limit, you will not find it. Why won't Google state that they have size restrictions? This document is over 150 pages but less than 100 pages have been crawled and indexed.
5) Google needs to work harder to remove spam and many duplicates and near duplicates from the database. If that doesn't work, how about clustering them underneath one representative page?
6) I've been a Blogger user for a couple of years. I'm also supposed to have better access to customer service since I'm a former Blogger Pro customer. I had a problem about a month ago (it's been solved) but I've yet to hear back from Blogger's customer service. I've also heard from a few people that service for Google Appliance customers might also leave a bit to be desired.
8) When the company announces a new service it should work properly. Three weeks ago they announced a shortcut allowing you to enter an airline name and flight number. However, it doesn't work for many airlines (including two major airlines serving Google's home state California and for some flights if they have a four digit flight number. I told Google about the problem within hours of the service being launched.
9) In late August, IEEE announced that Google was crawling abstracts from their publication database. According to the news release, the project would be completed by September. That was five months ago and just a very small percentage of IEEE material appears in Google. What happened? Also, even if the material is in the Google database will the average searcher be able to find it if it's on the third page of results? Would they be better off going to a specialized database in the first place?
10) Those of you who search for news on the open web shouldn't forget that tools like NewsNow, Rocket News, and Yahoo News crawl more news sources than Google News does. BTW, the image database in place at AltaVista/AllTheWeb is first rate and Ask.Com's Smart Answers (placing answers not only links on the results page) shows lots of promise.
11) Another web search company that we're likely to hear from this year is Dipsie. Jason Weiner, Dipsie's CEO, tells me that the company plans to be online by the end of the year with an idex of somewhere between 3-5 billion pages. Here's a recent mention of Dipsie in the Financial Post.
12) In 2001, Google spokesperson David Krane told News.Com, "...we've firmly established ourselves as the No. 1 search service on the Internet, and this can be attributed to our laser-like focus on a search-only business model." It's obvious that the laser-like business model is gone. The company has many constituencies to please and will have even more once they go public. Is Google doing what AltaVista, Excite and so many others did by trying to become all things to all people?
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.