A Look Around the New Home of the Internet Archive & A Few Comments from Brewster
Rob Pegoraro from the Washington Post headed out to San Francisco and visited Brewster Kahle and his team at the Internet Archive (IA). It's a very interesting article and a excellent primer for anyone interested in one of our favorite tools that goes beyond The Wayback Machine. That said, Wayback is one of the top two or three essential tools for Internet researchers. As time moves on, it will become even more important.
In terms of physical location, the Internet Archive is now located in the former home of Christian Science church. They moved to this location from the Presidio of San Francisco last Fall. The article makes also points out that some Internet Archive scanning takes place in a former Christian Science reading room. But this is not the only place where books and other materials are scanned. If you take a look at various collections of books in the IA you can see collections digitized at other places. For example, here is the U. of Toronto page.
We were thrilled to see that Pegoraro mentions The Open Library project which is an Internet Archive "initiative." The just relaunched their database and we posted an extended item about what the enhanced database can users. It's one of the cooler searchable databases we've seen and also very cool is that The Open Library is doing work with LibraryThing and Goodreads which is great to see.
Here's one quote from the article. Pegoraro asks Kahle about data formats that would work well for long-term storage.
I [Pegoraro] wrapped up our interview by asking Kahle for his preferred file formats for long-term storage, since I get that kind of question fairly often from readers. He said the archive uses FLAC (Free Lossless Audio Compression) for music, had adopted H.264 for video storage after trying five other formats, used JPEG for photos and employed a related format, JPEG 2000, for text-heavy images. But he also said that for personal storage, PDF or nearly universally supported commercial formats -- even Microsoft Office -- would be fine, too.
With the realization that articles in newspapers can't go on forever (and have to pass by several editors) the only thing we would have loved to seen a mention of is anything about Archive-It.
If you're unaware of the service, here's a brief overview.
Archive-It is a fee-based service that many non-profits, schools (K-12 and higher ed), libraries, archives, and others use to archive their own websites or collections based on topics of interest to that organization.
As of today Archive-It has more than 1000 public collections that you can access and search. Plus, when you search an archived collection unlike The Wayback Machine you can use keywords. For example, this page lists all of public collections. Near the top you'll see that the complete ACLU web site is archived here.
Overall, an excellent read that would be a great resource to share with others especially those of you who teach web search and discuss The Wayback Machine and the archiving of web content.
Note from Gary: I was able to visit IA HQ the week they moved in to this new location. Mucho cool and I can only imagine the move-in is complete and more cool things are going on.
A family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success. Read more »
Recently I have found myself cooing over visualisation maps (and heat maps) of health and well being resources. The content rich data is overlayed with mapping technologies, and some interesting themes and patterns are emerging.
A lot of the talk around social media in the last year has been around information overload. Social media has provided us with new and exciting ways to create content. But it has also meant learning new ways to manage and engage with social media tools. Are we teetering on the edge of an information overload precipice?
Information overload is a figment of your imagination. Or a failure of your filter. Or a symptom of your technological submissiveness. Depends on who you ask.
What if you had to sort through 3.5 million articles and social media posts a day and try to pull out the most relevant items for your organisation? What if you then had to cobble it all together into something readable for your top groups and executives in your organisation?
Alacra Compliance saves time by aggregating information from both free and fee-based sources and enabling users to conduct an accurate federated search across these sources (coined “simultaneous search” by Alacra).