Google Releases the Much Discussed "Caffeine" Index
You might remember that last August Google announced a new project, code-name Caffeine, that was basically to being build to and replace the entire infrastructure that Google uses to crawl, index, and rank pages. During the time Caffeine was being tested, especially in those first days after the announcement, some said that they noticed fast speeds in getting searches completed and results returned.
Tonight, Google has announced that the Caffeine technology for all Google searches is now live. A blog post from GOOG titled, "Our new search index: Caffeine" has details.
Facts (According to the Google Blog Post):
+ 50% Fresher Results Compared to the Old Indexing System (We Will try to Get a Precise Definition What this Means in Terms of Actual Time)
+ Largest Index Ever
+ Every Second Caffeine Processes Hundreds of Thousands of Pages in Parallel
If this were a pile of paper it would grow three miles taller every second.
+ Caffeine Takes Up Nearly 100 million Gigabytes of Storage in One Database
+ Information at a Rate of Hundreds of Thousands of Gigabytes Per Day
Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.
With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.s.
In terms of what this means to the searcher in terms of how to construct a search, nothing has changed. However, it pages are being refreshed more frequently it means the cache is also being updated more frequently. So, if you want a copy of a page the way it looked at Noon on Wednesday, it's probably a good idea to make a copy for yourself (have you tried Zotero?) Why? Because by 12:15 on Wednesday the content on the page might have changed and that means the cache has been updated. This new index could bring more attention to the importance of personal index management.
Do these faster times (I'm sure MANY will be testing to see how accurate Google's numbers are) mean anything to the typical Google searcher? Obviously, for the "power" searcher the potential for better results seems strong.
Remember, when all search engines placed on their homepage their total size? It meant little if not nothing and it's no longer being done. Will recrawl and refresh times be a new metric that search engines use to promote/market themselves to users.
Note: Vanessa makes an essential point. The Caffeine index has not changed Google's ranking algorithm. Two different things.
Here's one more point that is important to keep in mind. Thanks Vanessa.
Note that the introduction of Caffeine doesn’t necessarily mean that pages will be crawled on a faster schedule than before. It simply means that once those pages are crawled, they are made available to searchers much more quickly. (Remember, you can estimate how often your pages are crawled by taking a look at your server logs or checking the cache dates in Google.)
UPDATE: We posted this item at 10pm EDT. It was in the main Google database less than a minute after we posted it. Impressive!
A family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success. Read more »
Recently I have found myself cooing over visualisation maps (and heat maps) of health and well being resources. The content rich data is overlayed with mapping technologies, and some interesting themes and patterns are emerging.
A lot of the talk around social media in the last year has been around information overload. Social media has provided us with new and exciting ways to create content. But it has also meant learning new ways to manage and engage with social media tools. Are we teetering on the edge of an information overload precipice?
Information overload is a figment of your imagination. Or a failure of your filter. Or a symptom of your technological submissiveness. Depends on who you ask.
What if you had to sort through 3.5 million articles and social media posts a day and try to pull out the most relevant items for your organisation? What if you then had to cobble it all together into something readable for your top groups and executives in your organisation?
Alacra Compliance saves time by aggregating information from both free and fee-based sources and enabling users to conduct an accurate federated search across these sources (coined “simultaneous search” by Alacra).