A Selected List of Web Page Preservation and Archiving Projects
Before we begin, please note that this is far from a comprehensive list. It's just a beginning. Many large web archiving projects (in many languages) are coming online all of the time. Plus, others already exist that we did mention in this first go around. In other words, more to come.
The European Archive Foundation said Thursday it has launched its massive digital library of free music and film. The nonprofit organization collaborates with national libraries and other organizations to make non-copyrighted, or free-use material available to the public.
+ Using Archive-It Technology from the Internet Archive, here are a few of the collections built so far using Archive-It. Learn about each of these archives and find links to many more on this page.
The 2004 Presidential Term Web Harvest is a National Archives and Records Administration (NARA) project that produced a collection of federal web sites copied, or harvested, from the world wide web between 10/14/04 and 11/19/04. The Heritrix web harvester and a list of 982 active and unrestricted second level URLs were used to capture all linked federal sites down to the fourth level. Those initial 982 ".gov" and ".mil" URLs were provided by U.S. General Services Administration's (GSA) ".GOV" Internet Domain Registry and the Defense Information Systems Agency (DOD/DISA)...The harvest collection contains approximately 6.5 terabytes of information, roughly 75 million web pages and represents about 50,000 ".gov" and ".mil" unrestricted federal web sites active between 10/14/04 and 11/19/04.
PANDORA, Australia's Web Archive is a growing collection of copies of Australian online publications, established initially by the National Library of Australia in 1996, and now built in collaboration with nine other Australian libraries and other cultural collecting organisations.
Despite our apparent dependence on this medium, very little attention has been paid to the long-term preservation of websites. Indeed, with the life of an average website estimated to be around 44 days (about the same lifespan as a housefly) there is a danger that invaluable scholarly, cultural, and scientific resources will be lost to future generations. To address this problem, a consortium of six leading UK institutions is working collaboratively on a project to develop a test-bed for selective archiving of UK websites. View the project timeline here.
A book that provides a plainspoken and thorough introduction to the web for historians who wish to produce online work, or to build upon and improve the projects they have already started.
A family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success. Read more »
Recently I have found myself cooing over visualisation maps (and heat maps) of health and well being resources. The content rich data is overlayed with mapping technologies, and some interesting themes and patterns are emerging.
A lot of the talk around social media in the last year has been around information overload. Social media has provided us with new and exciting ways to create content. But it has also meant learning new ways to manage and engage with social media tools. Are we teetering on the edge of an information overload precipice?
Information overload is a figment of your imagination. Or a failure of your filter. Or a symptom of your technological submissiveness. Depends on who you ask.
What if you had to sort through 3.5 million articles and social media posts a day and try to pull out the most relevant items for your organisation? What if you then had to cobble it all together into something readable for your top groups and executives in your organisation?
Alacra Compliance saves time by aggregating information from both free and fee-based sources and enabling users to conduct an accurate federated search across these sources (coined “simultaneous search” by Alacra).