Hongyan Liu

From WikiPapers
Jump to: navigation, search

Hongyan Liu is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Enhancing accessibility of microblogging messages using semantic knowledge Accessibility
Clustering
Labeling
Microblogging
International Conference on Information and Knowledge Management, Proceedings English 2011 The volume of microblogging messages is increasing exponentially with the popularity of microblogging services. With a large number of messages appearing in user interfaces, it hinders user accessibility to useful information buried in disorganized, incomplete, and unstructured text messages. In order to enhance user accessibility, we propose to aggregate related microblogging messages into clusters and automatically assign them semantically meaningful labels. However, a distinctive feature of microblogging messages is that they are much shorter than conventional text documents. These messages provide inadequate term co occurrence information for capturing semantic associations. To address this problem, we propose a novel framework for organizing unstructured microblogging messages by transforming them to a semantically structured representation. The proposed framework first captures informative tree fragments by analyzing a parse tree of the message, and then exploits external knowledge bases (Wikipedia and WordNet) to enhance their semantic information. Empirical evaluation on a Twitter dataset shows that our framework significantly outperforms existing state-of-the-art methods. 0 0
Human gene/protein synonym dictionary from WikiLinks Encyclopedia
Gene/protein synonym
Information storage and retrieval
Names
Terminology
WikiLinks
Wikipedia
2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011 English 2011 Many genes and proteins have alternate names (synonyms) in scientific literature, posing a challenge to effectively organize and exchange information. To address this issue, there have been several initiatives to collate the synonyms into dictionaries. Biothesaurus is an extensive dictionary derived from multiple authoritative sources. Despite its extensive coverage, there are still some synonyms not covered by Biothesaurus. Wikipedia could be a useful source of the missing synonyms, as it has a diverse set of contributors in comparison with authoritative resources, that constitute Biothesaurus. This paper reports a feasibility study of using WikiLinks to find synonyms that are not currently covered by Biothesaurus. Wikipedia pages containing the word gene or protein were included in this study. 121 candidate synonyms were extracted from WikiLinks referencing 7,339 (16%) human genes. This number is significant, given that Biothesaurus has been earlier evaluated to have a coverage of 87%. Hence, WikiLinks were found to be a useful source for collating gene synonyms that are not recorded in authoritative databases. Biothesaurus was evaluated to cover 52% of the extracted candidate synonyms not documented in NCBI. The current study will be extended in scope to cover all genes and to extract synonyms from free text in Wikipedia pages. Copyright 0 0
Quantifying the trustworthiness of social media content Content
Quality
Social media
Trust evaluation
Trustworthiness
Distributed and Parallel Databases English 2011 The growing popularity of social media in recent years has resulted in the creation of an enormous amount of user-generated content. A significant portion of this information is useful and has proven to be a great source of knowledge. However, since much of this information has been contributed by strangers with little or no apparent reputation to speak of, there is no easy way to detect whether the content is trustworthy. Search engines are the gateways to knowledge but search relevance cannot guarantee that the content in the search results is trustworthy. A casual observer might not be able to differentiate between trustworthy and untrustworthy content. This work is focused on the problem of quantifying the value of such shared content with respect to its trustworthiness. In particular, the focus is on shared health content as the negative impact of acting on untrustworthy content is high in this domain. Health content from two social media applications, Wikipedia and Daily Strength, is used for this study. Sociological notions of trust are used to motivate the search for a solution. A two-step unsupervised, feature-driven approach is proposed for this purpose: a feature identification step in which relevant information categories are specified and suitable features are identified, and a quantification step for which various unsupervised scoring models are proposed. Results indicate that this approach is effective and can be adapted to disparate social media applications with ease. 0 0
Computing semantic relatedness between named entities using Wikipedia Semantic relatedness
Web mining
Proceedings - International Conference on Artificial Intelligence and Computational Intelligence, AICI 2010 English 2010 In this paper the authors suggest a novel approach that uses Wikipedia to measure the semantic relatedness between Chinese named entities, such as names of persons, books, softwares, etc. The relatedness is measured through articles in Wikipedia that are related to the named entities. The authors select a set of "definition words" which are hyperlinks from these articles, and then compute the relatedness between two named entities as the relatedness between two sets of definition words. The authors propose two ways to measure the relatedness between two definition words: by Wiki-articles related to the words or by categories of the words. Proposed approaches are compared with several other baseline models through experiments. The experimental results show that this method renders satisfactory results. 0 0