More Facts About the Library of Congress / Twitter Archive; Includes Text of LC/Twitter Agreement
UPDATE (8/30/2010): We've learned from the Library of Congress that they're still deciding the definition of what credentials, beyond a reader ID card, a person would need to access the archive. Also, as we said in earlier posts some curated portions of Twitter archive will likely be accessible on the Internet but it still needs to be decided if the Twitter Archive will be accessible on the web.
BOTTOM LINE: All of the decisions that need to be made are still up in the air.
A couple of weeks ago a massive amount of attention was given to the fact that a copy of the Twitter Archive (back to day one in 2006) along with rolling updates would be a gift to the Library of Congress from Twitter. In fact, Twitter approached LC about the project.
It was also one of the only times that we can think of in the Google "age" that another organization, in this case the Library of Congress, released a similar service on the same day and received more attention than Google.
Earlier that same day, Google announced that "Google Replay" was live on the web and while very small at launch, eventually the entire archive would be searchable back to day one in 2006).
What we thought was perhaps most interesting is that very few news organizations of any kind made it clear that the LC Twitter Archive would NOT BE ACCESSIBLE either online or at the Library of Congress itself by the general public to utilize while the Google version of the archive was already online and would be accessible to anyone with a web connection.
The FAQ continues with why it's important to archive and preserve Twitter and as info pros and many others know, LC collects a wide range of materials. Matt writes:
Individually tweets might seem insignificant, but viewed in the aggregate, they can be a resource for future generations to understand life in the 21st century.
That's true. However, in the original news release from LC, a link to a tweet by President Obama the morning after his victory speech in Chicago is provided. In our view, it's a single tweet that had significance as it was posted and continues to have significance today and will likely have it years from now. A single tweet or a small group of tweets that seem insignificant today might have great significance in the future be it 20, 50, or 100 years from now. Of course, the opposite may also be true.
Next, we read that deleted tweets, private account info, links to pictures and websites will not be archived. So, if someone or the masses tweets and then links to an important government report, info (a link) where to find it will not accessible from the archive. That's a new fact to us and seems a bit strange. Links are central to what Twitter can do. We also learn that LC does not plan to "collect" the linked sites.
Of course, the Internet Archive, Archive-It, and other projects including those from Harvard U. and the University of California are collecting and archiving sites.
Finally, on the six month window between a tweet is tweeted and the time it has the potential to reach the database. We don't know how often the Twitter archive will be updated (daily, monthly, quarterly)? We mentioned some of this in our April 19th report.
The FAQ concludes with some ideas about how LC wants to use this archive as a tool to learn more about digital preservation, as a case study for developing a processes for usage, developing tools for researcher access (read about the Stanford group in the April 19th post), "as well as from the Library’s ongoing experience with serving collections and protecting privacy and rights."
LC will NOT try to reproduce Twitter's functionality (we're guessing that has more to do with retrieval of material than the actual posting of tweets). Two examples of archives already online that might serve as examples of what LC could do with the Twitter material are the National Elections Web Archive and the Supreme Court Nominations Web Archive. They will make an announcement when they are ready for researcher use.
Perhaps the bottom line is that a lot of decisions at LC have to be made. We said that two weeks ago. Patience! It's also why they are calling this a starter list of FAQ's.
However, issues remain that we were surprised not to have seen mentioned in this first FAQ since they were mentioned in the press.
The collection is browsable, searchable and was commissioned by LC from The Internet Archive. In other words, the Library of Congress did not do the actual crawling and archiving.
We have been told that no part of the the Twitter Archive would be web accessible and available to the general public. It would only be available to qualified researchers (a definition that is TBD) at the Library of Congress. No access outside would be available outside of the LC buildings. In our view, this example confuses the situation.
C) To be clear, we weren't able to find a mention of who will and who will not have access to LC's Twitter Archive.
D) Will LC make any of this content accessible to the public perhaps through a display at one of the LC buildings in D.C.?
E) Who will be doing the actual archiving? We know that it's very likely LC will be handling the preservation work but will LC have the technology in place to capture each tweet? What we think is most likely to happen is Twitter will be sending files to LC (on a predetermined basis) via the Internet.
That's about it. If/when we think of more we'll add them.
We want to be clear. We are beyond thrilled that LC is part of the project. As they point out in the FAQ, if nothing else this will be a tremendous learning experience for future digital archiving project with the scope and notoriety that Twitter provides. Also, kudos to Twitter for thinking of LC and offering the archive as a gift.
Of course, one question only Microsoft and Twitter can answer, is Bing also going to offer a Twitter archive?
A family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success. Read more »
Recently I have found myself cooing over visualisation maps (and heat maps) of health and well being resources. The content rich data is overlayed with mapping technologies, and some interesting themes and patterns are emerging.
A lot of the talk around social media in the last year has been around information overload. Social media has provided us with new and exciting ways to create content. But it has also meant learning new ways to manage and engage with social media tools. Are we teetering on the edge of an information overload precipice?
Information overload is a figment of your imagination. Or a failure of your filter. Or a symptom of your technological submissiveness. Depends on who you ask.
What if you had to sort through 3.5 million articles and social media posts a day and try to pull out the most relevant items for your organisation? What if you then had to cobble it all together into something readable for your top groups and executives in your organisation?
Alacra Compliance saves time by aggregating information from both free and fee-based sources and enabling users to conduct an accurate federated search across these sources (coined “simultaneous search” by Alacra).