Receive the weekly sampler of posts and "Resource of the Week".
Subscribe »

Enter your
email address:

My Account »


Bookmark and Share

Testimonial?
If you find ResourceShelf useful, please supply a testimonial »








Home > ResourceBlog > Article

« All ResourceBlog Articles

 

Bookmark and Share   Feed

Monday, 19th April 2010

Resource of the Week: The Twitter Archives from the Library of Congress & Google: The Facts As We Know Them

UPDATE (8/30/2010): We've learned from the Library of Congress that they're still deciding the definition of what credentials, beyond a reader ID card, a person would need to access the archive. Also, as we said in earlier posts some curated portions of Twitter archive will likely be accessible on the Internet but it still needs to be decided if the Twitter Archive will be accessible on the web.

BOTTOM LINE: All of the decisions that need to be made are still up in the air.

Resource of the Week: The Twitter Archives from the Library of Congress & Google: The Facts As We Know Them
By Gary Price, Founder and Senior Editor

This has been one busy week. The stream of news and new resources never -- and we mean never -- slowed down.

But as it turns out, one story stands above the rest this week -- the new Twitter archives and, to be more specific, the announcement that the Library of Congress would be getting a copy. A few hours before LC began to get the word out (via a tweet, appropriately), Google announced they were already online with a searchable version of the Twitter archive. As of today, the Google's Twitter archive only goes back a few months to February 2010 but "eventually" the entire archive back to day one will be available and searchable.

What we would like to do in this post is go over the facts and, where we don't have the exact info we need, take educated guess at the answers. Keep in mind that things do change and, in some cases, further details need to discussed and decisions need to be made.

We read all of the primary documents (links are available), used the Google service, and were fortunate enough to have a telephone chat with a spokesperson from LC. We also read some "way out" stuff (e.g., the Library of Congress bought Twitter) but most of the time, just a fact or two were either missing or a bit "off".

So, with all of that out of the way, let's get to the details.

The Library of Congress Twitter Archive

Update: Martha Anderson, the director of National Digital Information Infrastructure and Preservation Program (NDIIPP). was interviewed by Phoebe Connelly from The American Prospect. We've read the interview and added some additional facts/quotes from it and added them to our post. Look for the three asterisks ***.

+ The Library of Congress Twitter archive will not be accessible to and searchable by the general public on the Internet or at the Library of Congress in Washington D.C. However, the archive will be accessible to researchers on-site at LC. Details about researcher access will be developed and made public in the next few months, but it's likely a researcher will have to certify his or her identity by at least signing a form. Again, exact details are forthcoming.

As of today, if you want to search back a couple of years to look at tweets, Google is your only option.

Yes, we still wonder if Bing or a smaller player will offer another online/public/searchable Twitter archive.

+ The Library of Congress did not buy anything from Twitter. The archive is a gift from Twitter.

*** "Twitter approached us." --Martha Anderson, NDIIPP

A) Will the Library of Congress get updates for the archive from Twitter (so Twitter does the actual archiving and LC does the preservation work; or B) LC does both the actual archiving and preservation. From what we've pieced together, our best guess at this time is A.

*** "There will be rolling updates" --Martha Anderson (via The American Prospect)

If choice A is accurate, how will the updates get to LC? Will they be physically sent on hard drives via UPS or Fedex or will massive files -- perhaps a months worth of Tweets -- be sent and downloaded via the Internet? With networks getting faster and faster (LC is a member of Internet2), along with software allowing data to move more quickly, it's more likely to be a download situation.

+ There will be an embargo/delay of six months before new tweets enter the Library of Congress Twitter database, as mentioned in the official Twitter announcement/blog post.

*** "There's a built in six-month window, so we don't have the live Twitter archive at any given time. There is a window for people if they want to delete their tweets, things like that." --Martha Anderson, NDIIPP (via American Prospect)

+ The Library of Congress Twitter Archive WILL BE accessible to Library of Congress staff for internal use only; for non-commercial research by qualified and credentialed researchers -- those terms still need to be defined, as mentioned earlier -- and for limited public display by LC.

Finally, LC can do preservation work with the Twitter tweets.

This sentence was included in the Library of Congress news release:

While the Twitter archive will not be posted online, the Library envisions posting selected content around topics or themes, similar to existing VHP (Veterans History Project) presentations.

After we read the "the library envisions..." in the news release, we once again know that no Twitter data via LC will be online.

This ResourceShelf post contains links to the the primary documents mentioned in this section.

*** In the interview with The American Prospect, we learn that a group from Stanford U. will be involved with the Twitter Archive at LC.

We have a partnership with Stanford University, a bunch of very bright mathematical grad students who have been helping us understand how to mine even our digital collections here. We hope to put them to work building tools to help people make order out of it.

They've done some really interesting work for us on these digitized reports from the [Work Progress Administration] during the Great Depression. They were personal narratives -- people went out and interviewed people all over the country. It's in English, but it's colloquial sometimes. They've helped us get into it and make sense of it, because full-text searching doesn't always do the trick. -- Martha Anderson, NDIIPP ***

--

The Google Twitter Archive

+ Like most things Google, historical searching of tweets has a name. It's called Google Replay.

+ As of today, you CAN search using Google Replay only back to February 2010, with a minimal delay for new tweets. There is NO embargo/delay of tweets using Google Replay. "Eventually" (that term is not defined), the entire Twitter archive will be accessible and searchable using Google Replay by anyone from any computer that can access Google. BTW, this is what the Twitter home page looked like on September 30, 2006.

+ Google Replay uses the familiar Google timeline interface (as used with Google News for some time) where you can manipulate the timeline to narrow the focus to down to the minute. (Note the bar that sits on the timeline; it moves)

+ If you want to go directly to Google Replay, this link should get you there.

The example uses "Philadelphia" as the search term on February 14, 2010, from 4:32pm – 4:43pm. Also, notice the month and date; those can be changed. You can also use the arrows to move ahead or back a few minutes at a time, and you can also jump to right "now" with a single click.

The material for the Google Replay section, including primary documents, can be accessed here.

Summary

Both services are needed. Will others come into play.

The LC Archive is essential. It's going to receive cutting edge preservation; it will allow qualified researchers from LC and elsewhere to mine the data; it might even create a new exhibit at LC. However, it's not a publicly accessible research tool. I do wonder if people will show up wanting to use the database and not be able to. I would imagine the same thing happens regularly with LC users wanting to exit the library with LC materials. Or also -- people phoning LC, asking if they have a particular book and whether they can get it sent to them.

Google Replay IS for the public. It IS searchable and it IS easily manipulated to assist in focusing a search query. As we said a moment ago, it IS accessible from any computer connected to the web that can reach Google.

Update: "Tweets: What We Might Learn From Mundane Details" (via AOTUS Blog from Archivist of the United States, David Ferriero. (Hat Tip: ArchivesNext)


Category:

Views: 2926




blog comments powered by Disqus

« All ResourceBlog Articles

 

Read about the FreePint FamilyFreePint Family

A family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success. Read more »


FeedLatest Family Articles:


Click to view the article Quilting big data threads
Thursday, 24th May 2012

Recently I have found myself cooing over visualisation maps (and heat maps) of health and well being resources. The content rich data is overlayed with mapping technologies, and some interesting themes and patterns are emerging.


Click to view the article The fallacy of information overload
Wednesday, 23rd May 2012

A lot of the talk around social media in the last year has been around information overload. Social media has provided us with new and exciting ways to create content. But it has also meant learning new ways to manage and engage with social media tools. Are we teetering on the edge of an information overload precipice?


Click to view the article Information overload: fact, fantasy or filter failure?
Wednesday, 23rd May 2012

Information overload is a figment of your imagination. Or a failure of your filter. Or a symptom of your technological submissiveness. Depends on who you ask.


Click to view the article Newsdesk: tracking millions of pieces of information a day
Tuesday, 22nd May 2012

What if you had to sort through 3.5 million articles and social media posts a day and try to pull out the most relevant items for your organisation? What if you then had to cobble it all together into something readable for your top groups and executives in your organisation?


Click to view the article Alacra Compliance adds managerial oversight
Tuesday, 22nd May 2012

Alacra Compliance saves time by aggregating information from both free and fee-based sources and enabling users to conduct an accurate federated search across these sources (coined “simultaneous search” by Alacra).


All Family Articles »
Family Articles by Category »


Tell us what you're working on,
and we'll talk to you about how FreePint can help »


FreePint Family Testimonials

"Fabulous resource to learn of unique tools and insights. Very useful." Manager, Futures and Forecasting, Virginia, USA

More testimonials »






Subscribe

Subscribe to the ResourceShelf Newsletter and receive the weekly sampler of posts and Resource of the Week.

Find out more »

ResourceShelf sponsored by:

Article Categories

All Article Categories »

Archive

All Archives »