Friday, 30th October 2009
The Library of Congress Unveils API for Chronicling America Digitized Newspaper Database and Directory
What follows is a post that might be of special interest to web developers, webmasters, site owners, or anyone who can work with an API (Application Programming Interface), It comes from a digitized collection of more than 1 million historic newspapers and a searchable directory of newspaper info. Even if you are don't have the technical skills required, it's possible you know someone who does and with their help you can partner to develop new resources, create mashups, etc. Btw, if you know of people who are able to work with an API, feel free to share this post with them.
First, some background.
We've posted about the CA program since the day it launched in March, 2007. The project is a joint effort between the Library of Congress and the National Endowment for the Humanities to digitize historic American newspapers. In addition to the digitized newspaper database CA also provides Chronicling America directory. It's both searchable with a powerful interface (a great example of what good metadata can do) and browsable. The directory contains information about most American newspapers published from 1690 to today.
On June 16, 2009, we ran a story about CA reaching a milestone. CA had just hit the one million digitized pages mark. It has grown a lot since then. About five weeks ago we posted an item about CA adding more than 192,000 pages to CA. The media release said the size of the database at that time contained 1,442,000 digitized pages from 171 titles, that were published between 1880 and 1922.
Thanks for the info but what about the API (Application Programming Interface) ?
The following from the "About the Chronicling America API" web page:
Chronicling America provides access to information about historic newspapers and select digitized newspaper pages. To encourage a wide range of potential uses, we designed several different views of the data we provide, all of which are publicly visible. Each uses common Web protocols, and access is not restricted in any way. You do not need to apply for a special key to use them. Together they make up an extensive application programming interface (API) which you can use to explore all of our data in many ways.
The rest of the web page offers technical details about the API.
Programmable Web has also posted about the new API.
Here are a couple of highlights:
Search results are available on the web site appear with terms highlighted. The API does not have access to highlight information, but it does contain thumbnails. Each page has a permalink back to the Library of Congress site, which displays the page in a zoomable, draggable viewer similar to Google Map.
The Library of Congress is focused on making these public domain works widely available. As such, this is an API without any registration or key necessary. Thatís pretty wide open.
Among the interesting technical details is that the API can return linked data via RDF. Itís good to see reference sites, especially government ones, support semantic web formats (there are now 20 APIs in our directory with RDF support.)
Sources: Library of Congress, Programmable Web
Hat Tip: Dan C.