Here's one for the techies and IR geeks out there. A recent (June 15, 2006) presentation sponsored by Microsoft Research from Evgeniy Gabrilovich, Ph.D. student, Computer Science Department, Technion Israel Institute of Technology. The presentation runs 78 minutes and PPT slides are available.
From the Description:
We propose to enrich document representation through automatic use of vast repositories of human knowledge. To this end, we use knowledge concepts derived from the Open Directory Project and Wikipedia, the largest Web directory and encyclopedia, respectively. In the preprocessing phase, a feature generator analyzes the input documents and maps them onto relevant concepts. The latter give rise to a set of generated features that augment the standard bag of words. Feature generation is accomplished through contextual analysis of document text, thus implicitly performing word sense disambiguation. Coupled with the ability to generalize from words to concepts, this approach addresses the two main problems of natural language processing synonymy and polysemy. Categorizing documents with the aid of knowledge-based features leverages information that cannot be deduced from the training documents alone. Empirical results confirm that this knowledge-intensive representation brings text categorization to a qualitatively new level of performance across a diverse collection of datasets. We also propose a new, knowledge-based approach for computing the degree of semantic relatedness of texts.
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.