Research Paper: Detecting Spam Web Pages through Content Analysis
Detecting Spam Web Pages through Content Analysis
10 pages; PDF.
by Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly
Fulll text of paper presented at 15th International World Wide Web Conference (WWW 2006), Edinburgh, United Kingdom, May 2006.
From the abstract:
In this paper, we continue our investigations of “web spam”: the injection of artificially-created pages into the web in order to influenc the results from search engines, to drive traffic to certain pages for fun or profit. This paper considers some previously-undescribed techniques for automatically detecting spam pages, examines the effectiveness of these techniques in isolation and when aggregated using classification algorithms. When combined, our heuristics correctly identify 2,037 (86.2%) of the 2,364 spam pages (13.8%) in our judged collection of 17,168 pages, while misidentifying 526 spam and non-spam pages (3.1%).
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.