Browse wiki

Jump to: navigation, search
A Wikipedia-Based Multilingual Retrieval Model
Abstract This paper introduces CL-ESA, a new multilThis paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document chosen from the “L-subset” of Wikipedia. Likewise, for a second document d′ written in language L′, , we construct a concept vector d′, using from the L′-subset of the Wikipedia the topic-aligned counterparts of our previously chosen documents. Since the two concept vectors d and d′ are collection-relative representations of d and d′ they are language-independent. I. e., their similarity can directly be computed with the cosine similarity measure, for instance. We present results of an extensive analysis that demonstrates the power of this new retrieval model: for a query document d the topically most similar documents from a corpus in another language are properly ranked. Salient property of the new retrieval model is its robustness with respect to both the size and the quality of the index document collection. quality of the index document collection.
Abstractsub This paper introduces CL-ESA, a new multilThis paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document chosen from the “L-subset” of Wikipedia. Likewise, for a second document d′ written in language L′, , we construct a concept vector d′, using from the L′-subset of the Wikipedia the topic-aligned counterparts of our previously chosen documents. Since the two concept vectors d and d′ are collection-relative representations of d and d′ they are language-independent. I. e., their similarity can directly be computed with the cosine similarity measure, for instance. We present results of an extensive analysis that demonstrates the power of this new retrieval model: for a query document d the topically most similar documents from a corpus in another language are properly ranked. Salient property of the new retrieval model is its robustness with respect to both the size and the quality of the index document collection. quality of the index document collection.
Bibtextype misc  +
Citeulike 3112610  +
Doi 10.1007/978-3-540-78646-7_51  +
Has author Martin Potthast + , Benno Stein + , Maik Anderka +
Has remote mirror http://www.uni-weimar.de/medien/webis/publications/papers/stein_2008b.pdf  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 522-530  +
Peer-reviewed Yes  +
Published in 30th European Conference on IR Research (ECIR 08) +
Title A Wikipedia-Based Multilingual Retrieval Model +
Type unknown  +
Year 2008 +
Creation dateThis property is a special property in this wiki. 29 January 2012 12:58:27  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 19 March 2012 15:21:43  +
DateThis property is a special property in this wiki. 2008  +
hide properties that link here 
A Wikipedia-Based Multilingual Retrieval Model + Title
 

 

Enter the name of the page to start browsing from.