Improving semi-supervised text classification by using wikipedia knowledge
|Improving semi-supervised text classification by using wikipedia knowledge|
|Author(s)||Zhang Z., Lin H., Li P., Wang H., Lu D.|
|Published in||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Keyword(s)||Clustering Based Classification, Semi-supervised Text Classification, Wikipedia (Extra: Classification methods, Document Representation, Labeled and unlabeled data, Semantic relationships, Text classification, Vector space models, Wikipedia, Wikipedia knowledge, Information management, Information retrieval systems, Semantics, Text processing, Vector spaces, Classification (of information))|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Improving semi-supervised text classification by using wikipedia knowledge is a 2013 conference paper written in English by Zhang Z., Lin H., Li P., Wang H., Lu D. and published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
Semi-supervised text classification uses both labeled and unlabeled data to construct classifiers. The key issue is how to utilize the unlabeled data. Clustering based classification method outperforms other semi-supervised text classification algorithms. However, its achievements are still limited because the vector space model representation largely ignores the semantic relationships between words. In this paper, we propose a new approach to address this problem by using Wikipedia knowledge. We enrich document representation with Wikipedia semantic features (concepts and categories), propose a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Experiment results on several corpora show that our proposed method can effectively improve semi-supervised text classification performance.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.