Sunday, 30th November 2003
Odd Google Results and Web Search Techniques
Source: PSU Live
New Study: "No 'ifs,' 'ands' or 'buts': simple queries work best"
A news release about a recently published study by Bernard J. Jansen, at Penn State University. From the release, Searching experts who tout the benefits of using advanced query markers have it wrong: Web searches with "and," "or," "must appear" and "phrase" fare no better than simple, dressed-down submissions, according to a Penn State researcher. I haven't had a chance to read the complete article but allow me to make a few comments about the summary.
* This study was conducted over 2 years ago using Excite. Search engines have grown in size since then. Btw, the Excite database/technology of 2001 is no more. This engine now is a meta-search tool. Of course, discussion on this topic should also consider the query length (the average query remains at about 2.4 terms) and search term selection (does the patron want info about Golden Retriever Rescue in Toronto but starts his or her search with "dogs"?).
* Using "AND" in your search query is redundant. All engines use an implied "AND" between terms.
* Operators like "must appear" that are typically used to reduce the number of results sometimes increased them. This search term weighting option never made sense to me, either you want it or don't want it.
* True, common phrases are often "understood" by some web engines but for uncommon names/phrases/sentence I still find searching with quotation marks to be useful. Also, using quotes can be useful when searching stopwords. Compare "to be or not to be" Shakespeare with Shakespeare to be or not to be
* AllTheWeb automatically rewrites common phrases by automatically adding quotation marks. This is the default behavior.
* This summary makes no mention of additional limiting syntax like site: , filetype:, and AltaVista's NEAR (10 words in either direction) operator. IMHO, these remain valuable limiting options for the advanced searcher and information professional.
* As Greg Notess documents, Google's OR operator and other syntax continues to behave erratically.
* The article summary concludes with comments about web pages designers needing to include relevancy keywords. Again, without seeing the complete article I'm a bit up in the air about what this means. If it means using the terms the average people uses to describe a service, item, topic on your web page then this is good advice. However, if it means using keywords in the meta-tag section of the page, this is a waste of time. No major web engines pay attention to the meta-keyword tag anymore.
* Of course another way to increase precision is to find a specialized or niche database devoted to a topic. Dr. Lee Giles, also at Penn State, is doing impressive work in developing "niche" databases using open web content.
See Also: While on the web search beat, Google, is full of many odd results these days. Here are a few examples I've noticed recently.
Search 1: Christmas
Result #8 is for librarian, Ex Libris editor, and ResourceShelf friend Marylaine Block. She is a cheerful and merry person but how her page got to this position has stumped a couple of web search experts. The cache version will show that the term is not on the page.
NOTE: Since alerting Marylaine to this story she has added the term Christmas to her page. Here is a screen capture of the web page from 12/3/03 illustrating what I first found.
Search 2: apple tree
Result #1 is for the National Physical Lab in the UK. No mention of either term on the web page.
television history united kingdom
Result #1 is a page that doesn't contain any of the search terms. Yes, it's for a tv company but it still falls far short.