Hua Li

From WikiPapers
Jump to: navigation, search

Hua Li is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A composite kernel approach for dialog topic tracking with structured domain knowledge from Wikipedia 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 Dialog topic tracking aims at analyzing and maintaining topic transitions in ongoing dialogs. This paper proposes a composite kernel approach for dialog topic tracking to utilize various types of domain knowledge obtained from Wikipedia. Two kernels are defined based on history sequences and context trees constructed based on the extracted features. The experimental results show that our composite kernel approach can significantly improve the performances of topic tracking in mixed-initiative human-human dialogs. 0 0
User interest profile identification using Wikipedia knowledge database Family similarity
URL decay model
User profile
Web page Classification
Wikipedia knowledge network
Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013 English 2014 The interesting, targeted, relevant advertisement is considered as one of the most honest proceeds for personalizing recommendation. Topic identification is the most important technique for the unstructured web pages. Conventional content classification approaches based on bag of words are difficult to process massive web pages. In this paper, Wikipedia Category Network (WCN) nodes are used to identify a web page topic and estimate user's interest profile. Wikipedia is the largest contents knowledge database and updated dynamically. A basic interest data set is marked for WCN. The topic characterization for each WCN node is generated with the depth and breadth of the interest data set. To reduce the deviation of the breadth, a family generation algorithm is proposed to estimate the generation weight in WCN. Finally, an interest decay model based on URL number is proposed to represent user's interest profile in time period. Experimental results illustrated that the performance of Web page topic identification is significant using WCN with family model, and the profile identification model has a dynamical performance for active users. 0 0
Wikipedia-based Kernels for dialogue topic tracking Dialogue Topic Tracking
Kernel Methods
Spoken Dialogue Systems
Wikipedia
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings English 2014 Dialogue topic tracking aims to segment on-going dialogues into topically coherent sub-dialogues and predict the topic category for each next segment. This paper proposes a kernel method for dialogue topic tracking to utilize various types of information obtained from Wikipedia. The experimental results show that our proposed approach can significantly improve the performances of the task in mixed-initiative humanhuman dialogues. 0 0
Chinese text filtering based on domain keywords extracted from Wikipedia Text filtering
User profile
Wikipedia
Lecture Notes in Electrical Engineering English 2013 Several machine learning and information retrieval algorithms have been used for text filtering. All these methods have a common ground that they need positive and negative examples to build user profile. However, not all applications can get good training documents. In this paper, we present a Wikipedia based method to build user profile without any other training documents. The proposed method extracts keywords of a special category from Wikipedia taxonomy and computes the weights of the extracted keywords based on Wikipedia pages. Experiment results on Chinese news text dataset SogouC show that the proposed method achieves good performance. 0 0
Group matrix factorization for scalable topic modeling Large scale
Matrix factorization
Topic modeling
SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval English 2012 Topic modeling can reveal the latent structure of text data and is useful for knowledge discovery, search relevance ranking, document classification, and so on. One of the major challenges in topic modeling is to deal with large datasets and large numbers of topics in real-world applications. In this paper, we investigate techniques for scaling up the non-probabilistic topic modeling approaches such as RLSI and NMF. We propose a general topic modeling method, referred to as Group Matrix Factorization (GMF), to enhance the scalability and efficiency of the non-probabilistic approaches. GMF assumes that the text documents have already been categorized into multiple semantic classes, and there exist class-specific topics for each of the classes as well as shared topics across all classes. Topic modeling is then formalized as a problem of minimizing a general objective function with regularizations and/or constraints on the class-specific topics and shared topics. In this way, the learning of class-specific topics can be conducted in parallel, and thus the scalability and efficiency can be greatly improved. We apply GMF to RLSI and NMF, obtaining Group RLSI (GRLSI) and Group NMF (GNMF) respectively. Experiments on a Wikipedia dataset and a real-world web dataset, each containing about 3 million documents, show that GRLSI and GNMF can greatly improve RLSI and NMF in terms of scalability and efficiency. The topics discovered by GRLSI and GNMF are coherent and have good readability. Further experiments on a search relevance dataset, containing 30,000 labeled queries, show that the use of topics learned by GRLSI and GNMF can significantly improve search relevance. 0 0
Mash-up approach for web video category recommendation Proceedings - 4th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2010 English 2010 With the advent of web 2.0, billions of videos are now freely available online. Meanwhile, rich user generated information for these videos such as tags and online encyclopedia offer us a chance to enhance the existing video analysis technologies. In this paper, we propose a mash-up framework to realize video category recommendation by leveraging web information from different sources. Under this framework, we build a web video dataset from the YouTube API, and construct a concept collection for web video category recommendation (CCWV-CR) from this dataset, which consists of the web video concepts with small semantic gap and high categorization distinguishability. Besides, Wikipedia Propagation is proposed to optimize the video similarity measurement. The experiments on the large-scale dataset with 80,031 web videos demonstrate that: (1) the mash-up category recommendation framework has a great improvement than the existing state-of-art methods. (2) CCWV-CR is an efficient feature space for video category recommendation. (3) Wikipedia Propagation could boost the performance of video category recommendation. 0 0
Multi-view bootstrapping for relation extraction by exploring web features and linguistic features Lecture Notes in Computer Science English 2010 Binary semantic relation extraction from Wikipedia is particularly useful for various NLP and Web applications. Currently frequent pattern miningbased methods and syntactic analysis-based methods are two types of leading methods for semantic relation extraction task. With a novel view on integrating syntactic analysis on Wikipedia text with redundancy information from the Web, we propose a multi-view learning approach for bootstrapping relationships between entities with the complementary between theWeb view and linguistic view. On the one hand, from the linguistic view, linguistic features are generated from linguistic parsing on Wikipedia texts by abstracting away from different surface realizations of semantic relations. On the other hand, Web features are extracted from the Web corpus to provide frequency information for relation extraction. Experimental evaluation on a relational dataset demonstrates that linguistic analysis on Wikipedia texts and Web collective information reveal different aspects of the nature of entity-related semantic relationships. It also shows that our multiview learning method considerably boosts the performance comparing to learning with only one view of features, with the weaknesses of one view complement the strengths of the other. 0 0
Using semantic Wikis as collaborative tools for geo-ontology Collaborative work
Geo-ontology
Semantic web
Semantic Wikis
2010 18th International Conference on Geoinformatics, Geoinformatics 2010 English 2010 As ontology has become a convenient vehicle for domain knowledge and metadata, it is used for realizing information sharing at semantic level in geoscience. Building geo-ontology is a systematic engineering and requires collaborative work, while there is a lack of ontology edit tools supporting collaborative work. Since Wikis are cooperative tools for easy writing and sharing of content and semantic Wikis are the Wikis improved with Semantic Web for representing semantic information, we propose to use semantic Wikis as cooperative tools for building geo-ontology. An architecture similar as semantic Wiki for geo-ontology editing is presented. As well as the semantic hierarchy of geo-information and the evolvement mechanism of geo-ontology in it are presented. The usefulness of the approach is demonstrated by a small case study. 0 0
Enhancing text clustering by leveraging Wikipedia semantics English 2008 Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the important information on the semantic relationships between key terms. To overcome this problem, several methods have been proposed to enrich text representation with external resource in the past, such as WordNet. However, many of these approaches suffer from some limitations: 1) WordNet has limited coverage and has a lack of effective word-sense disambiguation ability; 2) Most of the text representation enrichment strategies, which append or replace document terms with their hypernym and synonym, are overly simple. In this paper, to overcome these deficiencies, we first propose a way to build a concept thesaurus based on the semantic relations (synonym, hypernym, and associative relation) extracted from Wikipedia. Then, we develop a unified framework to leverage these semantic relations in order to enhance traditional content similarity measure for text clustering. The experimental results on Reuters and OHSUMED datasets show that with the help of Wikipedia thesaurus, the clustering performance of our method is improved as compared to previous methods. In addition, with the optimized weights for hypernym, synonym, and associative concepts that are tuned with the help of a few labeled data users provided, the clustering performance can be further improved. 0 0