Wednesday, 9th June 2010
Google Releases the Much Discussed "Caffeine" Index
You might remember that last August Google announced a new project, code-name Caffeine, that was basically to being build to and replace the entire infrastructure that Google uses to crawl, index, and rank pages. During the time Caffeine was being tested, especially in those first days after the announcement, some said that they noticed fast speeds in getting searches completed and results returned.
Tonight, Google has announced that the Caffeine technology for all Google searches is now live. A blog post from GOOG titled, "Our new search index: Caffeine" has details.
Facts (According to the Google Blog Post):
+ 50% Fresher Results Compared to the Old Indexing System (We Will try to Get a Precise Definition What this Means in Terms of Actual Time)
+ Largest Index Ever
+ Every Second Caffeine Processes Hundreds of Thousands of Pages in Parallel
If this were a pile of paper it would grow three miles taller every second.
+ Caffeine Takes Up Nearly 100 million Gigabytes of Storage in One Database
+ Information at a Rate of Hundreds of Thousands of Gigabytes Per Day
Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.
With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever beforeóno matter when or where it was published.s.
In terms of what this means to the searcher in terms of how to construct a search, nothing has changed. However, it pages are being refreshed more frequently it means the cache is also being updated more frequently. So, if you want a copy of a page the way it looked at Noon on Wednesday, it's probably a good idea to make a copy for yourself (have you tried Zotero?) Why? Because by 12:15 on Wednesday the content on the page might have changed and that means the cache has been updated. This new index could bring more attention to the importance of personal index management.
Do these faster times (I'm sure MANY will be testing to see how accurate Google's numbers are) mean anything to the typical Google searcher? Obviously, for the "power" searcher the potential for better results seems strong.
Remember, when all search engines placed on their homepage their total size? It meant little if not nothing and it's no longer being done. Will recrawl and refresh times be a new metric that search engines use to promote/market themselves to users.
See Also: Vanessa Fox at Search Engine Land Has a Great Post and Should Be Read by All Content Owners and Webmasters (via SEL)
Note: Vanessa makes an essential point. The Caffeine index has not changed Google's ranking algorithm. Two different things.
Here's one more point that is important to keep in mind. Thanks Vanessa.
Note that the introduction of Caffeine doesnít necessarily mean that pages will be crawled on a faster schedule than before. It simply means that once those pages are crawled, they are made available to searchers much more quickly. (Remember, you can estimate how often your pages are crawled by taking a look at your server logs or checking the cache dates in Google.)
UPDATE: We posted this item at 10pm EDT. It was in the main Google database less than a minute after we posted it. Impressive!