Home > ResourceBlog > Article

« All ResourceBlog Articles

 

Bookmark and Share   Feed

Sunday, 29th July 2007

Grub, Distributed Crawling Will Be Used to Build Wikia While LookSmart Will Power Wikia Advertising; Other Open Source Crawling Tools

Grub distributed crawling technology is now being tested for use to to build Wikia after being acquired from Looksmart. This news came via comments from Wikia/Wikipedia co-founder, Jimmy Wales on Thursday.

Distributed crawling? Well, the concept has been around for web search for more than seven years. After a client is placed on your computer and during down time, you're computer will be one of many to crawl the web that will then build the overall database. See also the SETI@Home project.

More on that in a moment.

It's important to point out that advertising on the Wikia site will be delivered via a white-label ad serving platform from LookSmart.

This announcement was made two weeks ago.

It will handle both display and text-based ads and Wikia will be the first organization to use LookSmart's technology for the, "management/serving of display ad units utilizing CMP-based pricing."

"We did a lot of due diligence to find a flexible and intuitive ad serving technology that nets the highest revenue and yield," said Gil Penchina, CEO of Wikia. "We discovered in the process that LookSmart's platform and services not only provide dynamic optimization of both our advertisers and backfill networks, but the white label aspect of it fits perfectly with our brand strategy."

Now, back to the Grub crawling story.

Grub technology (the company's exec summary from 2001) was acquired by Looksmart in January, 2003) and is now in early testing for the Wikia project. LookSmart stopped using the technology in 2005 as mentioned in the annual report from April, 2006:

We discontinued the use and support of the Grub distributed crawling technology in 2005 in order to reallocate development and support resources to other revenue-generating initiatives in search technology.

Via a recent News.com item:

It's [Grub] meant to operate through open protocol and community collaborative added functions combined with the wiki.

+ Learn More About Grub ||| Monitor the Grub Wiki

History
"Help Grub Search the Past" by Chris Sherman, April 2003
&
LookSmart bets on distributed computing by Stefanie Olsen, News.com
See Also: Chris with more on Grub/Wikia in this Search Engine Land item posted the other day.

Blasts from the Past
Grub FAQ (12/09/2000). ||| Grub executive summary (April 2001)

Grub Home Pages (Back to 2000)
December 6, 2000 ||| January 30, 2003 ||| June 14, 2004

See Also: A Few Other Open Source Crawling/Search Tools

+ Nutch
Nutch (part of the Lucene project) is used several places including the massive U.S government web harvests containing terabytes of data. Another example is at UtilitySearch.info.

+ Heritrix

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler.

+ Avi Rappaport's essential SearchTools.com site lists many other open source crawlers and search engines.

Reading
Numerous projects are or have been tackling web search by building distributed and P2P tools
+ Emerging Semantic Communities in Peer Web Search

+ Scalable Hybrid Search on Distributed Databases

+ "Challenges in Distributed Information Retrieval" (PDF), From Yahoo Research

+ MINERVA: Collaborative P2P Search

+ Chora: Expert-based P2P Web Search

+ ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval

+ Distributed Search in P2P Networks - Internet Computing, IEEE (PDF)

+ Evaluation of Peer Based Web Search

+ Webcast: Social Web Search (Part 2) ||| Slides
Held at Indiana University

Abstract: This talk will present two research projects under way in the Network and agents Network (NaN), which study ways of leveraging online social behavior for better Web search. GiveALink.org is a social bookmarking site where users donate their personal bookmarks. A search and recommendation engine is built from a similarity network derived from the hierarchical structure of bookmarks, aggregated across users. 6S is a distributed Web search engine based on an ad adaptive peer network. By learning about each other, peers can route queries through the network to efficiently reach knowledgeable nodes. The resulting peer network structures itself as a small world that uncovers semantic communities and outperforms centralized search engines.

See Also: Learn More and Demo Here


Category:

Views: 1847




« All ResourceBlog Articles

 

FreePint

FreePint supports the value of information in the enterprise. Read more »


FeedLatest FreePint Content:


  • Click to view the article Product Update of Compliance Catalyst
    Thursday, 18th September 2014

    Chris Porter catches up with some of the latest developments in the Compliance Catalyst service from Bureau van Dijk (BvD) and highlights key changes since FreePint's full review in August 2013. Compliance Catalyst is a workflow application that helps organisations to take an informed, risk-based decision on whether to accept a potential new customer, supplier or other business partner. BvD has made numerous enhancements to the content and functionality of the service, driven by customer feedback.

  • Click to view the article Product Review of Factiva Companies & Executives (Introduction; Contact Details)
    Tuesday, 16th September 2014

    Chris Porter looks at the Companies & Executives component of the Dow Jones Factiva service. This component contains data and tools relating to companies, executives and industries. It was previously sold separately, but is now fully integrated into the main Factiva product. Factiva Companies & Executives is aimed at researchers and analysts in information-intensive organisations, as well as those involved in sales and business development. In the first part of the review Chris introduces the product and company and provides contact details.

  • Click to view the article Are Companies Making the Most of SharePoint?
    Tuesday, 16th September 2014

    SharePoint is much more than just a document collaboration tool; its social and collaborative elements are continually being strengthened and it now also serves as a social computing platform. But how many companies are actually taking advantage of all of its features? This is a question that often remains unanswered, so it was with some interest that I read an article which looks at research into the ways companies are really using SharePoint.

  • ... more ...

All FreePint Content »
FreePint Topics »


A FreePint Subscription delivers articles and reports that support your organisation's information practice, content and strategy.

Find out more and order a FreePint Subscription by visiting the
completing our online form: Subscription Order page.


FreePint Testimonials

"As a service providing practical support and guidance for what professionals do FreePint is pretty unique and decent value for money." ..."

Read more testimonials and supply yours »






 

 
 
 

Register

Register to receive the free ResourceShelf Newsletter, featuring highlighted posts.

Find out more »

Article Categories

All Article Categories »

Archive

All Archives »