Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge
|Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge|
|Author(s)||Gabrilovich E., Markovitch S.|
|Published in||Proceedings of the National Conference on Artificial Intelligence|
|Keyword(s)||Unknown (Extra: Artificial intelligence, Information retrieval, Knowledge representation, Learning systems, Online searching, Word processing, Encyclopedias, RFID technology, Text categorization, Wikipedia, Knowledge based systems)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge is a 2006 conference paper written in English by Gabrilovich E., Markovitch S. and published in Proceedings of the National Conference on Artificial Intelligence.
When humans approach the task of text categorization, they interpret the specific wording of the document in the much larger context of their background knowledge and experience. On the other hand, state-of-the-art information retrieval systems are quite brittle - they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. For instance, given the sentence "Wal-Mart supply chain goes real time", how can a text categorization system know that Wal-Mart manages its stock with RFID technology? And having read that "Ciprofioxacin belongs to the quinolones group", how on earth can a machine know that the drug mentioned is an antibiotic produced by Bayer? In this paper we present algorithms that can do just that. We propose to enrich document representation through automatic use of a vast compendium of human knowledge - an encyclopedia. We apply machine learning techniques to Wikipedia, the largest encyclopedia to date, which surpasses in scope many conventional encyclopedias and provides a cornucopia of world knowledge. Each Wikipedia article represents a concept, and documents to be categorized are represented in the rich feature space of words and relevant Wikipedia concepts. Empirical results confirm that this knowledge-intensive representation brings text categorization to a qualitatively new level of performance across a diverse collection of datasets. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.
- This section requires expansion. Please, help!
Cited byThis publication has 1 citations. Only those publications available in WikiPapers are shown here:
Cited 71 time(s)