Receive the weekly sampler of posts and "Resource of the Week".
Subscribe »

Enter your
email address:

My Account »


Bookmark and Share

Testimonial?
If you find ResourceShelf useful, please supply a testimonial »








Home > ResourceBlog > Article

« All ResourceBlog Articles

 

Bookmark and Share   \"Feed\"

Thursday, 15th November 2007

Researching Robots.txt Files: Study shows Google favored over other search engines by webmasters; Say Hello to BotSeer

When Dr. C Lee Giles writes, we read and learn.

Robots.txt Time: Study shows Google favored over other search engines by webmasters

Web site policy makers who use robots.txt files as gatekeepers to specify what is open and what is off limits to Web crawlers have a bias that favors Google over other search engines, say Penn State researchers whose study of more than 7,500 Web sites revealed Google’s advantage.

That finding was surprising, said C. Lee Giles, the David Reese Professor of Information Sciences and Technology who led the research team which developed a new search engine—BotSeer—for the study.

“We expected that robots.txt files would treat all search engines equally or maybe disfavor certain obnoxious bots, so we were surprised to discover a strong correlation between the robots favored and the search engines’ market share,” said Giles of Penn State’s College of Information Sciences and Technology (IST). While the study doesn’t include explanations for why Web policy makers have opted to favor Google, the researchers know the choice was made consciously. Not using a robots.txt file gives all robots equal access to a Web site.

As an example, some U.S. government sites favor Google’s bot—Googlebot – followed by Yahoo and MSN, according to the researchers.

“Robots.txt files are written by Web policy makers and administrators who have to intentionally specify Google as the favored search engine,” Giles said.

This news release is based on findings from the paper: Determining Bias to Search Engines from Robots.txt
7 pages; PDF.
Presented at: IEEE/WIC/ACM International Conference on Web Intelligence

See Also: Giles and Students Create BotSeer
BotSeer is a search engine for robots.txt. Its goal is to provide information about and access to robots.txt files throughout the web by crawling and indexing web robots.txt files and related documents. In addition, statistics about favored robots, comments and robot behavior is analyzed and presented.

See Also: Earlier This Year, Another Search Legend, Matt Koll, Now at Revolution Health, Pointed Out What Dr. Giles Writes About on the DailyMed Site (Available Only to Googlebot for Crawling)

See Also: Will SEO Become The Law For Federal Agencies?


Category:

Views: 1023



blog comments powered by Disqus

« All ResourceBlog Articles

 

Read about the FreePint FamilyThe FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.

'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'

Read about the FreePint Family »


Visit the FreePint ShopFreePint Shop: FreePint sells reports, resources and subscription products to support your information work and information-related decisions.

Latest: FreePint Volume: Critical Insight on Social Media 2012 (01 Feb 2012) | FUMSI Report: Folio on Conferences and Continuing Professional Development (26 Jan 2012) | FreePint Research Report: Information Governance Policies and Priorities (25 Jan 2012) | Docuticker Report: DocuTips on Health Literacy (19 Jan 2012) | VIP Magazine: 98 (18 Jan 2012)

Browse the FreePint Shop »


FUMSI ForumFUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.

Latest FUMSI Forum postings: [TIPPLE] eBook resources - Share (07 Feb 2012) | Most Shared Content on Sharing Information (01 Feb 2012) | Our own worst enemy? - a FUMSI Editorial (01 Feb 2012) | [TIPPLE] eBook resources - Manage (31 Jan 2012) | "Frictionless sharing" - exploring the c (31 Jan 2012)

Visit the FUMSI Forum and post »


VIP LiveWireVIP LiveWire: Offers commentary on emerging news stories of interest to premium content users, vendors and industry insiders.

Latest VIP LiveWire postings: Reuters takes the social media pulse (08 Feb 2012) | How to deal with the tech-savvy customer? (08 Feb 2012) | More ways for employers to poke around (01 Feb 2012) | Trust your supplier? Check with the Armadillo (01 Feb 2012) | Cloudy with a chance of... (01 Feb 2012)

Visit the VIP LiveWire »






Subscribe

Subscribe to the ResourceShelf Newsletter and receive the weekly sampler of posts and Resource of the Week.

Find out more »

ResourceShelf sponsored by:

Article Categories

All Article Categories »

Archive

All Archives »