Identifying Document Topics Using the Wikipedia Category Network
|Identifying Document Topics Using the Wikipedia Category Network|
|Published in||WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence|
|Keyword(s)||Retrieval models, Algorithms|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Identifying Document Topics Using the Wikipedia Category Network is a 2006 conference paper by Peter Schonhofen and published in WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence.
In the last few years the size and coverage of Wikipe- dia, a freely available on-line encyclopedia has reached the point where it can be utilized similar to an ontology or tax- onomy to identify the topics discussed in a document. In this paper we will show that even a simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories sur- prisingly well. We test the reliability of our method by pre- dicting categories ofWikipedia articles themselves based on their bodies, and by performing classification and cluster- ing on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of their texts.
- This section requires expansion. Please, help!