Cross-language information retrieval with latent topic models trained on a comparable corpus
|Cross-language information retrieval with latent topic models trained on a comparable corpus|
|Author(s)||Vulic I., De Smet W., Moens M.-F.|
|Published in||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Keyword(s)||comparable corpora, Cross-language retrieval, document models, multilingual retrieval, topic models, Wikipedia (Extra: Comparable corpora, Cross-language retrieval, Document model, Multilingual retrieval, Topic model, Wikipedia, Computational linguistics, Infrared devices, Models, Software agents, Statistics, Translation (languages), Information retrieval)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Cross-language information retrieval with latent topic models trained on a comparable corpus is a 2011 conference paper written in English by Vulic I., De Smet W., Moens M.-F. and published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
In this paper we study cross-language information retrieval using a bilingual topic model trained on comparable corpora such as Wikipedia articles. The bilingual Latent Dirichlet Allocation model (BiLDA) creates an interlingual representation, which can be used as a translation resource in many different multilingual settings as comparable corpora are available for many language pairs. The probabilistic interlingual representation is incorporated in a statistical language model for information retrieval. Experiments performed on the English and Dutch test datasets of the CLEF 2001-2003 CLIR campaigns show the competitive performance of our approach compared to cross-language retrieval methods that rely on pre-existing translation dictionaries that are hand-built or constructed based on parallel corpora.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 1 time(s)