The crawl uses the ALVIS focussed crawler that is guided by keywords. The key phrases relevant to the crawl are one of the following:
relevance to statistical machine translation with key phrases: cross-?lingual information access, smt systems?, statistical machine translation, textual information access, statistical translation models?, cross-?lingual information retrieval, information extraction,
or both of :
relevance to machine learning with key phrases: machine learning, statistical learning, kernel methods?, string kernels?, rational kernels?, online learning, support vector machines?, SVM, principal component analysis, independent component analysis, PCA, ICA, discriminative language models?, canonical correlation analysis, margin-?based translation models?, statistical language, latent dirichlet, automatic processing,
+ relevance to machine translation with key phrases: machine translation, information retrieval, language models?, translation models?, computational linguistics, lexicon extraction, comprehension aids?, multilingual lexicon, user trials, user evaluation, parallel corpora, language modelling, computer aided translation, comprehension aids, multilingual lexicons?, multilingual corpora, cross-?language information retrieval, natural language processing, multilingual lexicon extraction, human language technology, machine translation technology, machine translation systems?, cross-?lingual information retrieval, linguistic resources.