Cross-language retrieval

From WikiPapers
Jump to: navigation, search

Cross-language retrieval is included as keyword or extra keyword in 0 datasets, 0 tools and 4 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Cross-language information retrieval with latent topic models trained on a comparable corpus Vulic I.
De Smet W.
Moens M.-F.
Lecture Notes in Computer Science English 2011 In this paper we study cross-language information retrieval using a bilingual topic model trained on comparable corpora such as Wikipedia articles. The bilingual Latent Dirichlet Allocation model (BiLDA) creates an interlingual representation, which can be used as a translation resource in many different multilingual settings as comparable corpora are available for many language pairs. The probabilistic interlingual representation is incorporated in a statistical language model for information retrieval. Experiments performed on the English and Dutch test datasets of the CLEF 2001-2003 CLIR campaigns show the competitive performance of our approach compared to cross-language retrieval methods that rely on pre-existing translation dictionaries that are hand-built or constructed based on parallel corpora. 0 0
Cross-language retrieval using link-based language models Benjamin Roth
Dietrich Klakow
SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval English 2010 We propose a cross-language retrieval model that is solely based on Wikipedia as a training corpus. The main contributions of our work are: 1. A translation model based on linked text in Wikipedia and a term weighting method associated with it. 2. A combination scheme to interpolate the link translation model with retrieval based on Latent Dirichlet Allocation. On the CLEF 2000 data we achieve improvement with respect to the best German-English system at the bilingual track (non-significant) and improvement against a baseline based on machine translation (significant). 0 0
Crosslanguage Retrieval Based on Wikipedia Statistics Andreas Juffinger
Roman Kern
Michael Granitzer
Lecture Notes in Computer Science English 2009 In this paper we present the methodology, implementations and evaluation results of the crosslanguage retrieval system we have developed for the Robust WSD Task at CLEF 2008. Our system is based on query preprocessing for translation and homogenisation of queries. The presented preprocessing of queries includes two stages: Firstly, a query translation step based on term statistics of cooccuring articles in Wikipedia. Secondly, different disjunct query composition techniques to search in the CLEF corpus. We apply the same preprocessing steps for the monolingual as well as the crosslingual task and thereby acting fair and in a similar way across these tasks. The evaluation revealed that the similar processing comes at nearly no costs for monolingual retrieval but enables us to do crosslanguage retrieval and also a feasible comparison of our system performance on these two tasks. 0 0
Cross-language retrieval with wikipedia Schonhofen P.
Benczur A.
Biro I.
Csalogany K.
Lecture Notes in Computer Science English 2008 We demonstrate a twofold use of Wikipedia for cross-lingual information retrieval. As our main contribution, we exploit Wikipedia hyperlinkage for query term disambiguation. We also use bilingual Wikipedia articles for dictionary extension. Our method is based on translation disambiguation; we combine the Wikipedia based technique with a method based on bigram statistics of pairs formed by translations of different source language terms. 0 0