A multi-layer text classification framework based on two-level representation model
|A multi-layer text classification framework based on two-level representation model|
|Author(s)||Yun J., Jing L., Yu J., Huang H.|
|Published in||Expert Systems with Applications|
|Keyword(s)||Multi-layer classification, Semantics, Text classification, Text representation, Wikipedia (Extra: Benchmark data, Classification framework, Classification methods, Concept-based, Inverse Document Frequency, Kernel models, Multi-layer classification, Representation model, Reuters-21578, Semantic information, Semantic levels, Structured data, Syntactic information, Term Frequency, Text categorization, Text classification, Text data, Text representation, Wikipedia, Classification (of information), Damage detection, Semantics, Syntactics, Text processing)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
A multi-layer text classification framework based on two-level representation model is a 2012 conference paper written in English by Yun J., Jing L., Yu J., Huang H. and published in Expert Systems with Applications.
Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more difficult to be analyzed because it contains complicated both syntactic and semantic information. In this paper, we propose a two-level representation model (2RM) to represent text data, one is for representing syntactic information and the other is for semantic information. Each document, in syntactic level, is represented as a term vector where the value of each component is the term frequency and inverse document frequency. The Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. Meanwhile, we designed a multi-layer classification framework (MLCLA) to make use of the semantic and syntactic information represented in 2RM model. The MLCLA framework contains three classifiers. Among them, two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets (20Newsgroups, Reuters-21578 and Classic3) have shown that the proposed 2RM model plus MLCLA framework improves the text classification performance by comparing with the existing flat text representation models (Term-based VSM, Term Semantic Kernel Model, Concept-based VSM, Concept Semantic Kernel Model and Term + Concept VSM) plus existing classification methods. © 2011 Elsevier Ltd. All rights reserved.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 7 time(s)