Browse wiki

Jump to: navigation, search
Identifying document topics using the Wikipedia category network
Abstract In the last few years the size and coveragIn the last few years the size and coverage of Wikipedia, a community edited, freely available on-line encyclopedia has reached the point where it can be effectively used to identify topics discussed in a document, similarly to an ontology or taxonomy. In this paper we will show that even a fairly simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and also by performing classification and clustering on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of (or in addition to) their texts.nstead of (or in addition to) their texts.
Abstractsub In the last few years the size and coveragIn the last few years the size and coverage of Wikipedia, a community edited, freely available on-line encyclopedia has reached the point where it can be effectively used to identify topics discussed in a document, similarly to an ontology or taxonomy. In this paper we will show that even a fairly simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and also by performing classification and clustering on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of (or in addition to) their texts.nstead of (or in addition to) their texts.
Bibtextype misc  +
Citeulike 6753783  +
Doi 10.3233/WIA-2009-0162  +
Has author Peter Schönhofen +
Has keyword Wikipedia + , Ontology + , Classification + , Clustering +
Has paywall mirror http://iospress.metapress.com/content/03136m2h7u07xh00/  +
Issn 1570-1263  +
Issue 2  +
Language English +
Number of citations by publication 1  +
Number of references by publication 0  +
Pages 195-207  +
Published in Web Intelli. and Agent Sys. +
Title Identifying document topics using the Wikipedia category network +
Type unknown  +
Volume 7  +
Year 2009 +
Creation dateThis property is a special property in this wiki. 29 January 2012 13:50:31  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 9 February 2012 21:15:23  +
DateThis property is a special property in this wiki. 2009  +
hide properties that link here 
Social networks of Wikipedia + Has reference
Identifying document topics using the Wikipedia category network + Title
 

 

Enter the name of the page to start browsing from.