Improving text categorization with semantic knowledge in wikipedia

From WikiPapers
Jump to: navigation, search

Improving text categorization with semantic knowledge in wikipedia is a 2013 conference paper written in English by Wang X., Jia Y., Chen K., Fan H., Zhou B. and published in IEICE Transactions on Information and Systems.

[edit] Abstract

Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimen-sional. In traditional text classification methods, document texts are repre-sented with Bag of Words (BOW) text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of tradi-tional BOW model for text classification. In order to overcome the weak-ness of ignoring the semantic relationships among terms in document rep-resentation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based docu-ment representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.