Browse wiki

Jump to: navigation, search
Concept-based information retrieval using explicit semantic analysis
Abstract Information retrieval systems traditionallInformation retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keywordbased text representation with concept-based features, automatically extracted from massive human knowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional feature selection methods cannot be used, hence we propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results.ce over previous state-of-the-art results.
Abstractsub Information retrieval systems traditionallInformation retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keywordbased text representation with concept-based features, automatically extracted from massive human knowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional feature selection methods cannot be used, hence we propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results.ce over previous state-of-the-art results.
Bibtextype article  +
Doi 10.1145/1961209.1961211  +
Has author Egozi O. + , Shaul Markovitch + , Evgeniy Gabrilovich +
Has extra keyword Concept-based + , Concept-based retrieval + , Dataset + , Explicit semantics + , Feature selection methods + , High quality + , Human knowledge + , Keyword-based retrieval + , Labeled data + , Labeled training data + , Semantic search + , Term co-occurrence + , Text feature + , Text representation + , Wikipedia + , World knowledge + , Feature extraction + , Information retrieval systems + , Knowledge representation + , Search engine + , Semantics + , Information retrieval +
Has keyword Concept-based retrieval + , Explicit semantic analysis + , Feature selection + , Semantic search +
Issn 10468188  +
Issue 2  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Published in ACM Transactions on Information Systems +
Title Concept-based information retrieval using explicit semantic analysis +
Type journal article  +
Volume 29  +
Year 2011 +
Creation dateThis property is a special property in this wiki. 7 November 2014 06:36:08  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Journal articles  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 06:36:08  +
DateThis property is a special property in this wiki. 2011  +
hide properties that link here 
Concept-based information retrieval using explicit semantic analysis + Title
 

 

Enter the name of the page to start browsing from.