More than Four Million Permanently Archived Web Pages from U.S. Congress Now Available via 109th Congress Web Harvest
Last year we told you about the Presidential Term "Web Harvest: from the National Archives (NARA) and The Internet Archive, people who give us The Wayback Machine.
Key Facts:
+ Over 75 Million Archive .Gov and .Mil pages (6.5 Terabytes)
+ Unlike The Wayback Machine, Keyword Searchable Using Nutch Technology. You can also begin your search with a specific URL.
+ More Here and Here.
What does it contain?
+ More than four million pages (42 GB) crawled and archived between 11/11/06 and 12/11/06
+ Browse by Members Name
+ Browse by Committee Name
+ Browse by Leadership
+ Browse by House or Senate Organizations
The harvest produced a public reference copy of the web sites for the purpose of continual availability to the public, and also produced a record copy to be retained in the holdings of NARA...Web sites included in the harvest were identified from information provided by the Web Systems Branch of the House Information Resources staff and by Senate webmasters in the Offices of the Secretary of the Senate and the Sergeant at Arms.
The crawl was done using The Internet Archive's open-source Heretix Crawler.
See Also: Dr. James Billington, Librarian of Congress, recently told a U.S. House Committee, that the average life of a web site is between 44 and 75 days.
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.