Bin Wang

From WikiPapers
Jump to: navigation, search

Bin Wang is an author.


Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A correlation-based semantic model for text search Semantic correlation
Text search
Lecture Notes in Computer Science English 2014 With the exponential growth of texts on the Internet, text search is considered a crucial problem in many fields. Most of the traditional text search approaches are based on "bag of words" text representation based on frequency statics. However, these approaches ignore the semantic correlation of words in the text. So this may lead to inaccurate ranking of the search results. In this paper, we propose a new Wikipedia-based similar text search approach that the words in the texts and query text could be semantic correlated in Wikipedia. We propose a new text representation model and a new text similarity metric. Finally, the experiments on the real dataset demonstrate the high precision, recall and efficiency of our approach. 0 0
Query dependent pseudo-relevance feedback based on Wikipedia English 2009 Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF. It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF for query dependent expansion. Specifically, we classify TREC topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR to evaluate these methods. Experiments on four TREC test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness. 0 0
Entity-based query reformulation using Wikipedia CIKM English 2008 Many real world applications increasingly involve both structured data and text, and entity based retrieval is an important problem in this realm. In this paper, we present an automatic query reformulation approach based on entities detected in each query. The aim is to utilize semantics associated with entities for enhancing document retrieval. This is done by expanding a query with terms/phrases related to entities in the query. We exploit Wikipedia as a large repository of entity information. Our reformulated approach consists of three major steps : (1) detect representative entity in a query; (2) expand the query with entity related terms/phrases; and (3) facilitate term dependency features. We evaluate our approach in ad-hoc retrieval task on four TREC collections, including two large web collections. Experiments results show that significant improvement is possible by utilizing entity corresponding information. 0 0