Improving text categorization with semantic knowledge in wikipedia
|Improving text categorization with semantic knowledge in wikipedia|
|Author(s)||Wang X., Jia Y., Chen K., Fan H., Zhou B.|
|Published in||IEICE Transactions on Information and Systems|
|Keyword(s)||Document representation, Se-mantic matrix, Text categorization, Wikipedia (Extra: Document Representation, Experimental evaluation, Semantic relationships, Text categorization, Text classification, Text classification methods, Text representation, Wikipedia, Classification (of information), Semantics, Text processing)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Improving text categorization with semantic knowledge in wikipedia is a 2013 conference paper written in English by Wang X., Jia Y., Chen K., Fan H., Zhou B. and published in IEICE Transactions on Information and Systems.
Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimen-sional. In traditional text classification methods, document texts are repre-sented with Bag of Words (BOW) text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of tradi-tional BOW model for text classification. In order to overcome the weak-ness of ignoring the semantic relationships among terms in document rep-resentation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based docu-ment representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.