Resource of the Week: The Twitter Archives from the Library of Congress & Google: The Facts As We Know Them
UPDATE (8/30/2010): We've learned from the Library of Congress that they're still deciding the definition of what credentials, beyond a reader ID card, a person would need to access the archive. Also, as we said in earlier posts some curated portions of Twitter archive will likely be accessible on the Internet but it still needs to be decided if the Twitter Archive will be accessible on the web.
BOTTOM LINE: All of the decisions that need to be made are still up in the air.
Resource of the Week: The Twitter Archives from the Library of Congress & Google: The Facts As We Know Them
By Gary Price, Founder and Senior Editor
This has been one busy week. The stream of news and new resources never -- and we mean never -- slowed down.
But as it turns out, one story stands above the rest this week -- the new Twitter archives and, to be more specific, the announcement that the Library of Congress would be getting a copy. A few hours before LC began to get the word out (via a tweet, appropriately), Google announced they were already online with a searchable version of the Twitter archive. As of today, the Google's Twitter archive only goes back a few months to February 2010 but "eventually" the entire archive back to day one will be available and searchable.
What we would like to do in this post is go over the facts and, where we don't have the exact info we need, take educated guess at the answers. Keep in mind that things do change and, in some cases, further details need to discussed and decisions need to be made.
We read all of the primary documents (links are available), used the Google service, and were fortunate enough to have a telephone chat with a spokesperson from LC. We also read some "way out" stuff (e.g., the Library of Congress bought Twitter) but most of the time, just a fact or two were either missing or a bit "off".
So, with all of that out of the way, let's get to the details.
+ The Library of Congress Twitter archive will not be accessible to and searchable by the general public on the Internet or at the Library of Congress in Washington D.C. However, the archive will be accessible to researchers on-site at LC. Details about researcher access will be developed and made public in the next few months, but it's likely a researcher will have to certify his or her identity by at least signing a form. Again, exact details are forthcoming.
As of today, if you want to search back a couple of years to look at tweets, Google is your only option.
Yes, we still wonder if Bing or a smaller player will offer another online/public/searchable Twitter archive.
+ The Library of Congress did not buy anything from Twitter. The archive is a gift from Twitter.
A) Will the Library of Congress get updates for the archive from Twitter (so Twitter does the actual archiving and LC does the preservation work; or B) LC does both the actual archiving and preservation. From what we've pieced together, our best guess at this time is A.
*** "There will be rolling updates" --Martha Anderson (via The American Prospect)
If choice A is accurate, how will the updates get to LC? Will they be physically sent on hard drives via UPS or Fedex or will massive files -- perhaps a months worth of Tweets -- be sent and downloaded via the Internet? With networks getting faster and faster (LC is a member of Internet2), along with software allowing data to move more quickly, it's more likely to be a download situation.
+ There will be an embargo/delay of six months before new tweets enter the Library of Congress Twitter database, as mentioned in the official Twitter announcement/blog post.
*** "There's a built in six-month window, so we don't have the live Twitter archive at any given time. There is a window for people if they want to delete their tweets, things like that." --Martha Anderson, NDIIPP (via American Prospect)
+ The Library of Congress Twitter Archive WILL BE accessible to Library of Congress staff for internal use only; for non-commercial research by qualified and credentialed researchers -- those terms still need to be defined, as mentioned earlier -- and for limited public display by LC.
Finally, LC can do preservation work with the Twitter tweets.
This sentence was included in the Library of Congress news release:
They've done some really interesting work for us on these digitized reports from the [Work Progress Administration] during the Great Depression. They were personal narratives -- people went out and interviewed people all over the country. It's in English, but it's colloquial sometimes. They've helped us get into it and make sense of it, because full-text searching doesn't always do the trick. -- Martha Anderson, NDIIPP ***
--
The Google Twitter Archive
+ Like most things Google, historical searching of tweets has a name. It's called Google Replay.
+ As of today, you CAN search using Google Replay only back to February 2010, with a minimal delay for new tweets. There is NO embargo/delay of tweets using Google Replay. "Eventually" (that term is not defined), the entire Twitter archive will be accessible and searchable using Google Replay by anyone from any computer that can access Google. BTW, this is what the Twitter home page looked like on September 30, 2006.
+ Google Replay uses the familiar Google timeline interface (as used with Google News for some time) where you can manipulate the timeline to narrow the focus to down to the minute. (Note the bar that sits on the timeline; it moves)
+ If you want to go directly to Google Replay, this link should get you there.
The example uses "Philadelphia" as the search term on February 14, 2010, from 4:32pm – 4:43pm. Also, notice the month and date; those can be changed. You can also use the arrows to move ahead or back a few minutes at a time, and you can also jump to right "now" with a single click.
The material for the Google Replay section, including primary documents, can be accessed here.
Summary
Both services are needed. Will others come into play.
The LC Archive is essential. It's going to receive cutting edge preservation; it will allow qualified researchers from LC and elsewhere to mine the data; it might even create a new exhibit at LC. However, it's not a publicly accessible research tool. I do wonder if people will show up wanting to use the database and not be able to. I would imagine the same thing happens regularly with LC users wanting to exit the library with LC materials. Or also -- people phoning LC, asking if they have a particular book and whether they can get it sent to them.
Google Replay IS for the public. It IS searchable and it IS easily manipulated to assist in focusing a search query. As we said a moment ago, it IS accessible from any computer connected to the web that can reach Google.
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.