Web site policy makers who use robots.txt files as gatekeepers to specify what is open and what is off limits to Web crawlers have a bias that favors Google over other search engines, say Penn State researchers whose study of more than 7,500 Web sites revealed Google’s advantage.
That finding was surprising, said C. Lee Giles, the David Reese Professor of Information Sciences and Technology who led the research team which developed a new search engine—BotSeer—for the study.
“We expected that robots.txt files would treat all search engines equally or maybe disfavor certain obnoxious bots, so we were surprised to discover a strong correlation between the robots favored and the search engines’ market share,” said Giles of Penn State’s College of Information Sciences and Technology (IST). While the study doesn’t include explanations for why Web policy makers have opted to favor Google, the researchers know the choice was made consciously. Not using a robots.txt file gives all robots equal access to a Web site.
As an example, some U.S. government sites favor Google’s bot—Googlebot – followed by Yahoo and MSN, according to the researchers.
“Robots.txt files are written by Web policy makers and administrators who have to intentionally specify Google as the favored search engine,” Giles said.
See Also: Giles and Students Create BotSeer BotSeer is a search engine for robots.txt. Its goal is to provide information about and access to robots.txt files throughout the web by crawling and indexing web robots.txt files and related documents. In addition, statistics about favored robots, comments and robot behavior is analyzed and presented.
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.