Liping Jing

From WikiPapers
Jump to: navigation, search

Liping Jing is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A multi-layer text classification framework based on two-level representation model Multi-layer classification
Semantics
Text classification
Text representation
Wikipedia
Expert Systems with Applications English 2012 Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more difficult to be analyzed because it contains complicated both syntactic and semantic information. In this paper, we propose a two-level representation model (2RM) to represent text data, one is for representing syntactic information and the other is for semantic information. Each document, in syntactic level, is represented as a term vector where the value of each component is the term frequency and inverse document frequency. The Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. Meanwhile, we designed a multi-layer classification framework (MLCLA) to make use of the semantic and syntactic information represented in 2RM model. The MLCLA framework contains three classifiers. Among them, two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets (20Newsgroups, Reuters-21578 and Classic3) have shown that the proposed 2RM model plus MLCLA framework improves the text classification performance by comparing with the existing flat text representation models (Term-based VSM, Term Semantic Kernel Model, Concept-based VSM, Concept Semantic Kernel Model and Term + Concept VSM) plus existing classification methods. © 2011 Elsevier Ltd. All rights reserved. 0 0
Document Topic Extraction Based on Wikipedia Category Topic Extraction
Document Representation
Wikipedia Category
Semantic relatedness
CSO English 2011 0 0
High-order co-clustering text data on semantics-based representation model High-order co-clustering
Representation Model
Semantics
Text mining
Wikipedia
Lecture Notes in Computer Science English 2011 The language modeling approach is widely used to improve the performance of text mining in recent years because of its solid theoretical foundation and empirical effectiveness. In essence, this approach centers on the issue of estimating an accurate model by choosing appropriate language models as well as smooth techniques. Semantic smoothing, which incorporates semantic and contextual information into the language models, is effective and potentially significant to improve the performance of text mining. In this paper, we proposed a high-order structure to represent text data by incorporating background knowledge, Wikipedia. The proposed structure consists of three types of objects, term, document and concept. Moreover, we firstly combined the high-order co-clustering algorithm with the proposed model to simultaneously cluster documents, terms and concepts. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that our proposed high-order co-clustering on high-order structure outperforms the general co-clustering algorithm on bipartite text data, such as document-term, document-concept and document-(term+concept). 0 0
Multi-view LDA for semantics-based document representation Latent dirichlet allocation
Semantics
Topic model
Wikipedia category
Journal of Computational Information Systems English 2011 Each document and word can be modeled as a mixture of topics by Latent Dirichlet Allocation (LDA), which does not contain any external semantic information. In this paper, we represent documents as two feature spaces consisting of words and Wikipedia categories respectively, and propose a new method called Multi-View LDA (M-LDA) by combining LDA with explicit human-defined concepts in Wikipedia. M-LDA improves document topic model by taking advantage of both two feature spaces and their mapping relationship. Experimental results on classification and clustering tasks show M-LDA outperforms traditional LDA. 0 0
Text clustering based on granular computing and Wikipedia Granular computing
Text clustering
Wikipedia
Lecture Notes in Computer Science English 2011 Text clustering plays an important role in many real-world applications, but it is faced with various challenges, such as, curse of dimensionality, complex semantics and large volume. A lot of researches paid attention to deal with such problems by designing new text representation models and clustering algorithms. However, text clustering still remains a research problem due to the complicated properties of text data. In this paper, a text clustering procedure is proposed based on the principle of granular computing with the aid of Wikipedia. The proposed clustering method firstly identifies the text granules, especially focusing on concepts and words with the aid of Wikipedia. And then, it mines the latent patterns based on the computation of such granules. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that the proposed method improves the performance of text clustering by comparing with the existing clustering algorithm together with the existing representation models. 0 0
Text clustering based on granular computing and wikipedia Granular computing
Text clustering
Wikipedia
RSKT English 2011 0 0
Unsupervised feature weighting based on local feature relatedness Feature Relatedness
Feature Weighting
Semantics
Text Clustering
Lecture Notes in Computer Science English 2011 Feature weighting plays an important role in text clustering. Traditional feature weighting is determined by the syntactic relationship between feature and document (e.g. TF-IDF). In this paper, a semantically enriched feature weighting approach is proposed by introducing the semantic relationship between feature and document, which is implemented by taking account of the local feature relatedness - the relatedness between feature and its contextual features within each individual document. Feature relatedness is measured by two methods, document collection-based implicit relatedness measure and Wikipedia link-based explicit relatedness measure. Experimental results on benchmark data sets show that the new feature weighting approach surpasses traditional syntactic feature weighting. Moreover, clustering quality can be further improved by linearly combining the syntactic and semantic factors. The new feature weighting approach is also compared with two existing feature relatedness-based approaches which consider the global feature relatedness (feature relatedness in the entire feature space) and the inter-document feature relatedness (feature relatedness between different documents) respectively. In the experiments, the new feature weighting approach outperforms these two related work in clustering quality and costs much less computational complexity. 0 0
Semantics-based representation model for multi-layer text classification Multi-layer Classification
Representation Model
Semantics
Text Classification
Wikipedia
Lecture Notes in Computer Science English 2010 Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more complicated to be analyzed because it contains too much information, e.g., syntactic and semantic. In this paper, we propose a semantics-based model to represent text data in two levels. One level is for syntactic information and the other is for semantic information. Syntactic level represents each document as a term vector, and the component records tf-idf value of each term. The semantic level represents document with Wikipedia concepts related to terms in syntactic level. The syntactic and semantic information are efficiently combined by our proposed multi-layer classification framework. Experimental results on benchmark dataset (Reuters-21578) have shown that the proposed representation model plus proposed classification framework improves the performance of text classification by comparing with the flat text representation models (term VSM, concept VSM, term+concept VSM) plus existing classification methods. 0 0
Text clustering via term semantic units Compact representation
Term semantic units
Text clustering
Proceedings - 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010 English 2010 How best to represent text data is an important problem in text mining tasks including information retrieval, clustering, classification and etc. In this paper, we proposed a compact document representation with term semantic units which are identified from the implicit and explicit semantic information. Among it, the implicit semantic information is extracted from syntactic content via statistical methods such as latent semantic indexing and information bottleneck. The explicit semantic information is mined from the external semantic resource (Wikipedia). The proposed compact representation model can map a document collection in a low-dimension space (term semantic units which are much less than the number of all unique terms). Experimental results on real data sets have shown that the compact representation efficiently improve the performance of text clustering. 0 0