Home > ResourceBlog > Article

« All ResourceBlog Articles


Bookmark and Share   Feed

Monday, 6th August 2007

New program color-codes text in Wikipedia entries to indicate trustworthiness

New program color-codes text in Wikipedia entries to indicate trustworthiness

The online reference site Wikipedia enjoys immense popularity despite nagging doubts about the reliability of entries written by its all-volunteer team. A new program developed at the University of California, Santa Cruz, aims to help with the problem by color-coding an entry's individual phrases based on contributors' past performance.

The program analyzes Wikipedia's entire editing history--nearly two million pages and some 40 million edits for the English-language site alone--to estimate the trustworthiness of each page. It then shades the text in deepening hues of orange to signal dubious content. A 1,000-page demonstration version is already available on a web page operated by the program's creator, Luca de Alfaro, associate professor of computer engineering at UCSC.

Other sites already employ user ratings as a measure of reliability, but they typically depend on users' feedback about each other. This method makes the ratings vulnerable to grudges and subjectivity. The new program takes a radically different approach, using the longevity of the content itself to learn what information is useful and which contributors are the most reliable.

+ UCSC Wiki Lab: Wikipedia trust coloring demo

Source: University of California-Santa Cruz

A very interesting read! Some quick things we thought about as we read it:

+ This demo is based on a few thousand pages. We wonder how/if this system would scale from this relatively small amount of content to the massive amount of constantly changing and expanding content that make up Wikipedia at the present time. What about future growth? Wikipedia is likely to grow larger with new entries being added regularly. Can this system handle it? The larger the corpus of data becomes it can become more difficult to maintain even assuming that the Wikipedia user/editor base stays the same. What if volunteers move on to the "next big thing." What issues does having a large and respected (by many) database of old, incorrect, out of date, and potentially manipulated data mean for the researcher who has come to depend on Wikipedia?

+ The demo is based on the Wikipedia at a specific date in time. As we've mentioned in the past, anyone can download Wikipedia data and claim it as their own and place ads on it. The Wikipedia management does not force the person who downloads the content to make changes/updates to it. So, one of the positive attributes of Wikipedia, quick changes, is lost.

+ We have seen and read how easy it is to manipulate content on other sites.
From the web site: "authors whose contributions are undone lose reputation..." Can this also be gamed? We've heard of many stories where Wikipedia material that is accurate and added by honest Wikipedians has been removed, replaced, changed, etc. because an editor "doesn't like it". This content might be a specific statistical number (ie. the population of "xxx") or an entire entry. This News.com tells the story of one News.com writer.

+ Can this system alert editors to "long tail" material that has not been updated in xxx amount of time? It's one thing to have a Wikipedia entry but something else for it to be updated with the latest information. Popular topics are checked constantly but doesn't "long tail" material deserve the same review and updating? As the database grows larger this will become more of a challenge. Can this system help?

+ Could authors with strong, positive reputations, begin "selling" their reputations by placing/changing content for others?


...changes performed by low-reputation authors have a significantly larger than average probability of having poor quality, as judged by human observers, and of being later undone, as measured by our algorithms.

How does one measure quality? Quality of writing style? Source of information? Currency of information? How does one measure content that might be undone because someone else just doesn't like it for one reason or another?

+ Since anyone can download Wikipedia data and use it on their own site, how would a typical user know and understand that they might be viewing pages that do not reflect the rankings this algorithm provides.

ResourceShelf Contributing Editor, Dan Giancaterino adds:

The problem with basing this system on author reputation is that it can be gamed. There's nothing that would prevent anyone from adding lots of edits to obscure pages. Since very few people are actually seeing these edits, there's little chance of being reversed. Bingo ... cheap reputation. They've got to find a way to factor in the importance of the page being edited into the author reputation algorithm. And all this is obscuring the real point: I want/need an accurate answer to something. Not shades of orange.

+ Kudos to the author for also pointing out this issue in their paper. Yes, a new user is considered guilty until your reputation (over time) proves you otherwise. Sad but true and as we know constantly generating new user names to place/change new content is a closely related issue and likely easily done by someone who can write the code. Additionally, what about the Wikipedia user who registers (with good intentions) so they can change/update/modify a single entry?

Wikipedia allows users to register, and create an author identity, whenever they wish. As a consequence, we need to make the initial reputation of new authors very low, close to the minimum possible (in our case, 0). If we made the initial reputation of new authors any higher, then authors, after committing revisions that damage their reputation,would simply re-register as new users to gain the higher value. An unfortunate side-effect of allowing people to obtain new identities at will is that we cannot presume that people are innocent until proven otherwise: we have to assign to newcomers the same reputation as proven offenders.
This is a contributing factor to our reputation having low precision: many authors who have low reputation still perform very good quality revisions, as they are simply new authors, rather than proven offenders.

+ Might having some confirmed and accurate information about the author/contributor also be of value and cut down on these and similar problems? In other words, signed articles that also link to that person's background, reputation, etc. This is, to some degree, what Larry Sanger and the Citizendium project is up to.
From the Citizendium web site:

+ We offer gentle expert oversight.
+ We use our real names, not pseudonyms.


Views: 1393

« All ResourceBlog Articles



FreePint supports the value of information in the enterprise. Read more »

FeedLatest FreePint Content:

  • Click to view the article Product Review of Reg-Track (Introduction; Contact Details)
    Tuesday, 22nd July 2014

    Reviewer Chris Porter introduces Reg-Track, worldwide regulatory tracking service for compliance professionals, from Reg-Room LLC.


  • Click to view the article It's Okay to Browse the News
    Tuesday, 22nd July 2014

    This article looks at two recent European court decisions relating to online copyright and what it means for everyday internet users and media monitoring services. These cases also highlight some of the difficulties of applying copyright law to the online world.

  • Click to view the article FreePint Launches New Series - What You Need to Know Your Customer (KYC)
    Tuesday, 22nd July 2014

    FreePint is taking a fresh look at the ever-changing world of risk and compliance, with a particular focus on Know Your Customer (KYC) requirements. Chris Porter, co-producer of the series with Andrew Lucas, introduces the series.

  • ... more ...

All FreePint Content »
FreePint Topics »

A FreePint Subscription delivers articles and reports that support your organisation's information practice, content and strategy.

Find out more and order a FreePint Subscription by visiting the
completing our online form: Subscription Order page.

FreePint Testimonials

"It was really useful to get so much input from customers and hear their perspective - I have come into the office this morning full of things ..."

Read more testimonials and supply yours »




Register to receive the free ResourceShelf Newsletter, featuring highlighted posts.

Find out more »

Article Categories

All Article Categories »


All Archives »