Evgeniy Gabrilovich

From WikiPapers
Jump to: navigation, search

Evgeniy Gabrilovich is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Trust, but verify: Predicting contribution quality for knowledge base construction and curation Crowdsourcing
Knowledge base construction
Predicting contribution quality
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 The largest publicly available knowledge repositories, such as Wikipedia and Freebase, owe their existence and growth to volunteer contributors around the globe. While the majority of contributions are correct, errors can still creep in, due to editors' carelessness, misunderstanding of the schema, malice, or even lack of accepted ground truth. If left undetected, inaccuracies often degrade the experience of users and the performance of applications that rely on these knowledge repositories. We present a new method, CQUAL, for automatically predicting the quality of contributions submitted to a knowledge base. Significantly expanding upon previous work, our method holistically exploits a variety of signals, including the user's domains of expertise as reflected in her prior contribution history, and the historical accuracy rates of different types of facts. In a large-scale human evaluation, our method exhibits precision of 91% at 80% recall. Our model verifies whether a contribution is correct immediately after it is submitted, significantly alleviating the need for post-submission human reviewing. 0 0
Concept-based information retrieval using explicit semantic analysis Concept-based retrieval
Explicit semantic analysis
Feature selection
Semantic search
ACM Transactions on Information Systems English 2011 Information retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keywordbased text representation with concept-based features, automatically extracted from massive human knowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional feature selection methods cannot be used, hence we propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results. 0 0
Using the past to score the present: Extending term weighting models through Revision History Analysis Collaboratively generated content
Retrieval models
Term weighting
International Conference on Information and Knowledge Management, Proceedings English 2010 The generative process underlies many information retrieval models, notably statistical language models. Yet these models only examine one (current) version of the document, effectively ignoring the actual document generation process. We posit that a considerable amount of information is encoded in the document authoring process, and this information is complementary to the word occurrence statistics upon which most modern retrieval models are based. We propose a new term weighting model, Revision History Analysis (RHA), which uses the revision history of a document (e.g., the edit history of a page in Wikipedia) to redefine term frequency - a key indicator of document topic/relevance for many retrieval models and text processing tasks. We then apply RHA to document ranking by extending two state-of-the-art text retrieval models, namely, BM25 and the generative statistical language model (LM). To the best of our knowledge, our paper is the first attempt to directly incorporate document authoring history into retrieval models. Empirical results show that RHA provides consistent improvements for state-of-the-art retrieval models, using standard retrieval tasks and benchmarks. 0 0
AAAI 2008 workshop reports AI Magazine English 2009 AAAI was pleased to present the AAAI-08 Workshop Program, held Sunday and Monday, July 13-14, in Chicago, Illinois, USA. The program included the following 15 workshops: Advancements in POMDP Solvers; AI Education Workshop Colloquium; Coordination, Organizations, Institutions, and Norms in Agent Systems, Enhanced Messaging; Human Implications of Human-Robot Interaction; Intelligent Techniques for Web Personalization and Recommender Systems; Metareasoning: Thinking about Thinking; Multidisciplinary Workshop on Advances in Preference Handling; Search in Artificial Intelligence and Robotics; Spatial and Temporal Reasoning; Trading Agent Design and Analysis; Transfer Learning for Complex Tasks; What Went Wrong and Why: Lessons from AI Research and Applications; and Wikipedia and Artificial Intelligence: An Evolving Synergy. Copyright © 2009, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Wikipedia-based semantic interpretation for natural language processing J. Artif. Int. Res. English 2009 Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as {WordNet,} or on huge manual efforts such as the {CYC} project. Here we propose a novel method, called Explicit Semantic Analysis {(ESA),} for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using {ESA} results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the {ESA} model is easy to explain to human users. 0 1
Concept-based feature generation and selection for information retrieval Proceedings of the National Conference on Artificial Intelligence English 2008 Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based feature generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating features, that we have in supervised learning. We present a new feature selection method that is inspired by pseudo-relevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art. Copyright © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 0 0
Computing semantic relatedness using Wikipedia-based explicit semantic analysis IJCAI English 2007 0 3
Feature Generation for Textual Information Retrieval Using World Knowledge Technion ?Israel Institute of Technology English 2006 Imagine an automatic news filtering system that tracks company news. Given the news item {FDA} approves ciprofloxacin for victims of anthrax inhalation" how can the system know that the drug mentioned is an antibiotic produced by Bayer? Or consider an information professional searching for data on {RFID} technology - how can a computer understand that the item {"Wal-Mart} supply chain goes real time" is relevant for the search? Algorithms we present can do just that. 0 0
Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge Proceedings of the National Conference on Artificial Intelligence English 2006 When humans approach the task of text categorization, they interpret the specific wording of the document in the much larger context of their background knowledge and experience. On the other hand, state-of-the-art information retrieval systems are quite brittle - they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. For instance, given the sentence "Wal-Mart supply chain goes real time", how can a text categorization system know that Wal-Mart manages its stock with RFID technology? And having read that "Ciprofioxacin belongs to the quinolones group", how on earth can a machine know that the drug mentioned is an antibiotic produced by Bayer? In this paper we present algorithms that can do just that. We propose to enrich document representation through automatic use of a vast compendium of human knowledge - an encyclopedia. We apply machine learning techniques to Wikipedia, the largest encyclopedia to date, which surpasses in scope many conventional encyclopedias and provides a cornucopia of world knowledge. Each Wikipedia article represents a concept, and documents to be categorized are represented in the rich feature space of words and relevant Wikipedia concepts. Empirical results confirm that this knowledge-intensive representation brings text categorization to a qualitatively new level of performance across a diverse collection of datasets. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. 0 1
Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge AAAI English 2006 0 1