Centroid-based classification enhanced with Wikipedia
|Centroid-based classification enhanced with Wikipedia|
|Author(s)||Bawakid A., Oussalah M.|
|Published in||Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010|
|Keyword(s)||Categorization, Classification, Component, Semantics, Text enrichment, Wikipedia (Extra: Categorization, Classification, Component, Text enrichment, Wikipedia, Information retrieval systems, Knowledge representation, Learning systems, Semantics, Text processing)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Centroid-based classification enhanced with Wikipedia is a 2010 conference paper written in English by Bawakid A., Oussalah M. and published in Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010.
Most of the traditional text classification methods employ Bag of Words (BOW) approaches relying on the words frequencies existing within the training corpus and the testing documents. Recently, studies have examined using external knowledge to enrich the text representation of documents. Some have focused on using WordNet which suffers from different limitations including the available number of words, synsets and coverage. Other studies used different aspects of Wikipedia instead. Depending on the features being selected and evaluated and the external knowledge being used, a balance between recall, precision, noise reduction and information loss has to be applied. In this paper, we propose a new Centroid-based classification approach relying on Wikipedia to enrich the representation of documents through the use of Wikpedia's concepts, categories structure, links, and articles text. We extract candidate concepts for each class with the help of Wikipedia and merge them with important features derived directly from the text documents. Different variations of the system were evaluated and the results show improvements in the performance of the system.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.