Pu Wang

From WikiPapers
Jump to: navigation, search

Pu Wang is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Diversifying Query Suggestions by using Topics from Wikipedia Query suggestion diversification
Topics
Wikipedia
Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013 English 2013 Diversifying query suggestions has emerged recently, by which the recommended queries can be both relevant and diverse. Most existing works diversify suggestions by query log analysis, however, for structured data, not all query logs are available. To this end, this paper studies the problem of suggesting diverse query terms by using topics from Wikipedia. Wikipedia is a successful online encyclopedia, and has high coverage of entities and concepts. We first obtain all relevant topics from Wikipedia, and then map each term to these topics. As the mapping is a nontrivial task, we leverage information from both Wikipedia and structured data to semantically map each term to topics. Finally, we propose a fast algorithm to efficiently generate the suggestions. Extensive evaluations are conducted on a real dataset, and our approach yields promising results. 0 0
Embedding Semantics in LDA Topic Models Data-driven prior-knowledge
External prior-knowledge
Latent dirichlet allocation
Online lda
Semantic embedding
Topic models
Wikipedia-influenced topic model
Text Mining: Applications and Theory English 2010 [No abstract available] 0 0
Towards a universal text classifier: Transfer learning using encyclopedic knowledge Text classifiers
Transfer learning
Wikipedia
ICDM Workshops 2009 - IEEE International Conference on Data Mining English 2009 Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available. In this work, we propose a universal text classifier, which does not require any labeled document. Our approach simulates the capability of people to classify documents based on background knowledge. As such, we build a classifier that can effectively group documents based on their content, under the guidance of few words describing the classes of interest. Background knowledge is modeled using encyclopedic knowledge, namely Wikipedia. The universal text classifier can also be used to perform document retrieval. In our experiments with real data we test the feasibility of our approach for both the classification and retrieval tasks. 0 0
Using Wikipedia knowledge to improve text classification Text classification
Thesaurus
Wikipedia
Knowl. Inf. Syst. English 2009 Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the {œBag} of Words? {(BOW)} representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. In this paper, we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the {BOW} representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification. Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm. 0 0
Building semantic kernels for text classification using Wikipedia English 2008 Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The traditional document representation is a word-based vector (Bag of Words, or BOW), where each dimension is associated with a term of the dictionary containing all the words that appear in the corpus. Although simple and commonly used, this representation has several limitations. It is essential to embed semantic information and conceptual patterns in order to enhance the prediction capabilities of classification algorithms. In this paper, we overcome the shortages of the BOW approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents. Our empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BOW technique, and to other recently developed methods. 0 1
Using Wikipedia for Co-clustering Based Cross-Domain Text Classification Cross-domain text classification
Co-clustering
Transfer learning
Wikipedia
ICDM English 2008 0 0
Improving text classification by using encyclopedia knowledge Proceedings - IEEE International Conference on Data Mining, ICDM English 2007 The exponential growth of text documents available on the Internet has created an urgent need for accurate, fast, and general purpose text classification algorithms. However, the "bag of words" representation used for these classification methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. In order to deal with this problem, we integrate background knowledge - in our application: Wikipedia - into the process of classifying text documents. The experimental evaluation on Reuters newsfeeds and several other corpus shows that our classification results with encyclopedia knowledge are much better than the baseline "bag of words" methods. 0 0