Browse wiki

Jump to: navigation, search
Centroid-based classification enhanced with Wikipedia
Abstract Most of the traditional text classificatioMost of the traditional text classification methods employ Bag of Words (BOW) approaches relying on the words frequencies existing within the training corpus and the testing documents. Recently, studies have examined using external knowledge to enrich the text representation of documents. Some have focused on using WordNet which suffers from different limitations including the available number of words, synsets and coverage. Other studies used different aspects of Wikipedia instead. Depending on the features being selected and evaluated and the external knowledge being used, a balance between recall, precision, noise reduction and information loss has to be applied. In this paper, we propose a new Centroid-based classification approach relying on Wikipedia to enrich the representation of documents through the use of Wikpedia's concepts, categories structure, links, and articles text. We extract candidate concepts for each class with the help of Wikipedia and merge them with important features derived directly from the text documents. Different variations of the system were evaluated and the results show improvements in the performance of the system.ovements in the performance of the system.
Abstractsub Most of the traditional text classificatioMost of the traditional text classification methods employ Bag of Words (BOW) approaches relying on the words frequencies existing within the training corpus and the testing documents. Recently, studies have examined using external knowledge to enrich the text representation of documents. Some have focused on using WordNet which suffers from different limitations including the available number of words, synsets and coverage. Other studies used different aspects of Wikipedia instead. Depending on the features being selected and evaluated and the external knowledge being used, a balance between recall, precision, noise reduction and information loss has to be applied. In this paper, we propose a new Centroid-based classification approach relying on Wikipedia to enrich the representation of documents through the use of Wikpedia's concepts, categories structure, links, and articles text. We extract candidate concepts for each class with the help of Wikipedia and merge them with important features derived directly from the text documents. Different variations of the system were evaluated and the results show improvements in the performance of the system.ovements in the performance of the system.
Bibtextype inproceedings  +
Doi 10.1109/ICMLA.2010.17  +
Has author Abdullah Bawakid + , Mourad Oussalah +
Has extra keyword Categorization + , Classification + , Component + , Text enrichment + , Wikipedia + , Information retrieval systems + , Knowledge representation + , Learning systems + , Semantics + , Text processing +
Has keyword Categorization + , Classification + , Component + , Semantics + , Text enrichment + , Wikipedia +
Isbn 9780769543000  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 65–70  +
Published in Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010 +
Title Centroid-based classification enhanced with Wikipedia +
Type conference paper  +
Year 2010 +
Creation dateThis property is a special property in this wiki. 7 November 2014 03:37:51  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 03:37:51  +
DateThis property is a special property in this wiki. 2010  +
hide properties that link here 
Centroid-based classification enhanced with Wikipedia + Title
 

 

Enter the name of the page to start browsing from.