The public interface for Harvard's new Web Archive Collection Service (WAX) launched on February 4, 2009. WAX began as a pilot project in July 2006, funded by the University's Library Digital Initiative (LDI) to address the management of web sites by collection managers for long-term archiving. It was the first LDI project specifically oriented toward preserving "born-digital" material. WAX has now transitioned to a production system supported by the University Library's central infrastructure.
Collection managers, working in the online environment, must continue to acquire the content that they have always collected physically. With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage.
WAX was developed as an initial--and only partial--response to these and other concerns, which range from technical feasibility to legal and financial implications. The pilot focused on harvesting content from the surface web--content that is discoverable to search engines through web crawlers, as opposed to content hidden from web crawlers in a database or restricted by password or login protection.
Note: Of course, don't forget about The Wayback Machine from the Internet Archive (IA). It's now home to over 150 billion archived web pages. The IA also does "custom" web archiving via their very cool Archive-It service.
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.