Receive the weekly sampler of posts and "Resource of the Week".
Subscribe »

Enter your
email address:

My Account »


Bookmark and Share

Testimonial?
If you find ResourceShelf useful, please supply a testimonial »








Home > ResourceBlog > Article

« All ResourceBlog Articles

 

Bookmark and Share   \"Feed\"

Wednesday, 25th August 2010

IBM and EU Partner to Enable the Digitization of Historic European Texts on Massive Scale, Two Dozen+ National Libraries Participating

From the News Release:

IBM and the EU have expanded their research collaboration, which now includes more than two-dozen national libraries, research institutes, universities, and companies across Europe to provide new technology that will enable highly-accurate digitization of rare and culturally significant historical texts on a massive scale. Unlike past digitization projects where the result has been static, online libraries of texts, this unique widescale effort, called IMPACT (IMProving ACcess to Text), will offer new tools and best practices to institutions across Europe that will enable them to efficiently and accurately continue to produce quality digital replicas of historically significant texts and make them widely available, editable and searchable online.

[Snip]

"IMPACT is remarkable in that it not only allows these prominent centers of culture to ultimately bring people closer to perhaps never before seen historically significant texts of heritage -- but because it actually allows these people to become part of the preservation process," said Tal Drory, manager of the document processing group at IBM Research in Haifa. "IMPACT offers the first digitization system that combines the power of crowd computing with an adaptive optical character recognition (OCR) correction solution that can achieve excellent recognition rates across all kinds of documents – from the 15th century right up through the 19th century."

[Snip]

IMPACT technology streamlines, simplifies and accelerates the process of winnowing out questionable text scans, enabling reviewers to key in corrections to the text. Instead of displaying an entire scanned page, reviewers only see the actual letters or words in question. For example, the letter combination "r" and "n" ("rn") may appear indistinguishable from the letter "m." In those instances, the system collects many instances of the letter "m," and places these samples next to the letters in question, making it much easier to determine the letter's real identity.

In cases where an entire word is suspect, it is added to a collection of other questionable terms, which are then arranged in alphabetical order. Volunteer reviewers need only accept or reject suggested substitutes with one keystroke. In addition, the system uses adaptive dictionary enrichment, a method in which new words are added to a central dictionary based on cross-identification and correction by other users.

For example, a small book that normally takes four hours to key in manually, would take one hour using standard OCR technology with manual correction. Incorporating the new collaborative review technology cuts the process down to 30 minutes. IBM researchers explained that the new adaptive OCR system can further reduce the time, cutting it in half to 15 minutes.

Learn More About The IMPACT Consortium

See Also: IMPACT's Twitter Feed

See Also: IMPACT Video (via YouTube)

Source: IBM

The consortium partners include, among others: IBM Research – Haifa, Koninklijke Bibliotheek, The British Library, Osterreichische Nationalbibliothek, Universitat Innsbruck, Deutsche Nationalbibliothek, Bayerische Staatsbibliothek, Staats- und Universitatsbibliothek Gottingen, ABBYY Production, Instituut voor Nederlandse Lexicologie, National Centre for Scientific Research "Demokritos." Centrum fur Informations- und Sprachverarbeitung, University of Munich, University of Bath, University of Salford, Bibliotheque Nationale de France, Biblioteca Nacional de Espana and Poznan Supercomputing and Networking Center in Poland.

Views: 1082



blog comments powered by Disqus

« All ResourceBlog Articles

 

Read about the FreePint FamilyThe FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.

'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'

Read about the FreePint Family »


Visit the FreePint ShopFreePint Shop: FreePint sells reports, resources and subscription products to support your information work and information-related decisions.

Latest: FreePint Volume: Critical Insight on Social Media 2012 (01 Feb 2012) | FUMSI Report: Folio on Conferences and Continuing Professional Development (26 Jan 2012) | FreePint Research Report: Information Governance Policies and Priorities (25 Jan 2012) | Docuticker Report: DocuTips on Health Literacy (19 Jan 2012) | VIP Magazine: 98 (18 Jan 2012)

Browse the FreePint Shop »


FUMSI ForumFUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.

Latest FUMSI Forum postings: Most Shared Content on Finding Information (09 Feb 2012) | Times are changing - a FUMSI Editorial (09 Feb 2012) | [TIPPLE] eBook resources - Share (07 Feb 2012) | Most Shared Content on Sharing Information (01 Feb 2012) | Our own worst enemy? - a FUMSI Editorial (01 Feb 2012)

Visit the FUMSI Forum and post »


VIP LiveWireVIP LiveWire: Offers commentary on emerging news stories of interest to premium content users, vendors and industry insiders.

Latest VIP LiveWire postings: Compliance - it's not just financial (10 Feb 2012) | Social media and BRIC - new report (08 Feb 2012) | Reuters takes the social media pulse (08 Feb 2012) | How to deal with the tech-savvy customer? (08 Feb 2012) | More ways for employers to poke around (01 Feb 2012)

Visit the VIP LiveWire »






Subscribe

Subscribe to the ResourceShelf Newsletter and receive the weekly sampler of posts and Resource of the Week.

Find out more »

ResourceShelf sponsored by:

Article Categories

All Article Categories »

Archive

All Archives »