Receive the weekly sampler of posts and "Resource of the Week".
Subscribe »

Enter your
email address:

My Account »


Bookmark and Share

Testimonial?
If you find ResourceShelf useful, please supply a testimonial »








Home > ResourceBlog > Article

« All ResourceBlog Articles

 

Bookmark and Share   Feed

Saturday, 24th April 2004

Challenges in Web Search Engines

Web Search--Google
More on the Google/Anti-Semitic Site Story
Important and interesting reads from Seth Finkelstein and Danny Sullivan. No need to comment on this specific issue again but a couple of comments about the issue of search engine manipulation.

Last October, I commented that while most of the press coverage was focusing on paid inclusion (which Google doesn't offer) and paid placement and its potential effects on the web searcher, it was hard to find press coverage that organic search results can be manipulated (yes, even Google's results). This manipulation is the nature of the beast (we should learn to deal with it), and another reminder that general web engines are more than just "research tools" like a librarian might think of Dialog, LN, Factiva, and many others. Finkelstein correctly points out, "Google ranks popularity, not authority. And popularity is a measure which is vulnerable to many games. Any system of evaluation is subject to manipulation." While link analysis is similar in many ways to citation analysis, tools like ISI's Citation Indexes and ISI's Impact Factors are less susceptible to manipulation (NOT totally free of it) because it's a much smaller universe of material to control.

Let's remember web engines are also advertising/marketing vehicles. As Danny points out, results appearing in the 20th position are all but invisible to the average searcher. Sullivan's comments remind me of what someone told me at a presentation for the book I co-authored with Chris Sherman. A member of the audience told me that Chris and I failed to mention a large portion of the Invisible Web in our book. After taking a deep breath, I asked her what we forgot. She told me that for many searchers if it's not in the first five or seven results it's all but invisible. She was right!

The power searcher needs, first, to be aware of this issue and, second, to utilize advanced search syntax, term selection, specialized databases and other tools to assist in producing more precise result sets. This can help minimize problems. I also think that Teoma's method of determining relevance might be less susceptible to manipulation.

See Also: Challenges in Web Search Engines
This twelve-page paper was written by Dr. Monika Henzinger (Research Director, Google), Dr. Rajeev Motwani (Professor at Stanford) and Dr. Craig Silverstein (Director of Technology, Google). From the abstract, "...article presents a high-level discussion of some of the problems with information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas." Content quality, spam, cloaking, duplicate hosts and vaguely structured data are some of the topics discussed.
--
See Also, Full Text, Just Released, Web Spam Taxonomy
From the abstract, "Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures."

Views: 265




blog comments powered by Disqus

« All ResourceBlog Articles

 

Read about the FreePint FamilyFreePint Family

A family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success. Read more »


FeedLatest Family Articles:


Click to view the article Quilting big data threads
Thursday, 24th May 2012

Recently I have found myself cooing over visualisation maps (and heat maps) of health and well being resources. The content rich data is overlayed with mapping technologies, and some interesting themes and patterns are emerging.


Click to view the article The fallacy of information overload
Wednesday, 23rd May 2012

A lot of the talk around social media in the last year has been around information overload. Social media has provided us with new and exciting ways to create content. But it has also meant learning new ways to manage and engage with social media tools. Are we teetering on the edge of an information overload precipice?


Click to view the article Information overload: fact, fantasy or filter failure?
Wednesday, 23rd May 2012

Information overload is a figment of your imagination. Or a failure of your filter. Or a symptom of your technological submissiveness. Depends on who you ask.


Click to view the article Newsdesk: tracking millions of pieces of information a day
Tuesday, 22nd May 2012

What if you had to sort through 3.5 million articles and social media posts a day and try to pull out the most relevant items for your organisation? What if you then had to cobble it all together into something readable for your top groups and executives in your organisation?


Click to view the article Alacra Compliance adds managerial oversight
Tuesday, 22nd May 2012

Alacra Compliance saves time by aggregating information from both free and fee-based sources and enabling users to conduct an accurate federated search across these sources (coined “simultaneous search” by Alacra).


All Family Articles »
Family Articles by Category »


Tell us what you're working on,
and we'll talk to you about how FreePint can help »


FreePint Family Testimonials

"Fabulous resource to learn of unique tools and insights. Very useful." Manager, Futures and Forecasting, Virginia, USA

More testimonials »






Subscribe

Subscribe to the ResourceShelf Newsletter and receive the weekly sampler of posts and Resource of the Week.

Find out more »

ResourceShelf sponsored by:

Article Categories

All Article Categories »

Archive

All Archives »