Fei Wu

From WikiPapers
Jump to: navigation, search

Fei Wu is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Cross-media topic mining on wikipedia Cross media
Sparsity
Topic modeling
Wikipedia
MM 2013 - Proceedings of the 2013 ACM Multimedia Conference English 2013 As a collaborative wiki-based encyclopedia, Wikipedia pro- vides a huge amount of articles of various categories. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. To better organize these visual and textual data, one promising area of research is to jointly model the embedding topics across multi-modal data (i.e, cross-media) from Wikipedia. In this work, we propose to learn the projection matrices that map the data from heterogeneous feature spaces into a unified latent topic space. Different from previous approaches, by imposing the ℓ1 regularizers to the projection matrices, only a small number of relevant visual/textual words are associated with each topic, which makes our model more interpretable and robust. Further- more, the correlations of Wikipedia data in different modalities are explicitly considered in our model. The effectiveness of the proposed topic extraction algorithm is verified by several experiments conducted on real Wikipedia datasets. Copyright 0 0
Finding related tables Data integration
Related tables
Web tables
Proceedings of the ACM SIGMOD International Conference on Management of Data English 2012 We consider the problem of finding related tables in a large corpus of heterogenous tables. Detecting related tables provides users a powerful tool for enhancing their tables with additional data and enables effective reuse of available public data. Our first contribution is a framework that captures several types of relatedness, including tables that are candidates for joins and tables that are candidates for union. Our second contribution is a set of algorithms for detecting related tables that can be either unioned or joined. We describe a set of experiments that demonstrate that our algorithms produce highly related tables. We also show that we can often improve the results of table search by pulling up tables that are ranked much lower based on their relatedness to top-ranked tables. Finally, we describe how to scale up our algorithms and show the results of running it on a corpus of over a million tables extracted from Wikipedia. 0 0
Identifying aspects for web-search queries Journal of Artificial Intelligence Research English 2011 Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effectively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be "semantically" related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives - related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing. © 2011 AI Access Foundation. All rights reserved. 0 0
Read what you trust: An open wiki model enhanced by social context Proceedings - 2011 IEEE International Conference on Privacy, Security, Risk and Trust and IEEE International Conference on Social Computing, PASSAT/SocialCom 2011 English 2011 Wiki systems, such as Wikipedia, provide a multitude of opportunities for large-scale online knowledge collaboration. Despite Wikipedia's successes with the open editing model, dissenting voices give rise to unreliable content due to conflicts amongst contributors. From our perspective, the conflict issue results from presenting the same knowledge to all readers, without regard for the importance of the underlying social context, which both reveals the bias of contributors and influences the knowledge perception of readers. Motivated by the insufficiency of the existing knowledge presentation model for Wiki systems, this paper presents TrustWiki, a new Wiki model which leverages social context, including social background and relationship information, to present readers with personalized and credible knowledge. Our experiment shows, with reliable social context information, TrustWiki can efficiently assign readers to their compatible editor community and present credible knowledge derived from that community. Although this new Wiki model focuses on reinforcing the neutrality policy of Wikipedia, it also casts light on the other content reliability problems in Wiki systems, such as vandalism and minority opinion suppression. 0 0
Open information extraction using Wikipedia ACL English 2010 0 0
Amplifying Community Content Creation Using Mixed-Initiative Information Extraction English 2009 0 0
Amplifying community content creation with mixed-initiative information extraction Community content creation
Information extraction
Mixed-initiative interfaces
Conference on Human Factors in Computing Systems - Proceedings English 2009 Although existing work has explored both information extraction and community content creation, most research has focused on them in isolation. In contrast, we see the greatest leverage in the synergistic pairing of these methods as two interlocking feedback cycles. This paper explores the potential synergy promised if these cycles can be made to accelerate each other by exploiting the same edits to advance both community content creation and learning-based information extraction. We examine our proposed synergy in the context of Wikipedia infoboxes and the Kylin information extraction system. After developing and refining a set of interfaces to present the verification of Kylin extractions as a non-primary task in the context of Wikipedia articles, we develop an innovative use of Web search advertising services to study people engaged in some other primary task. We demonstrate our proposed synergy by analyzing our deployment from two complementary perspectives: (1) we show we accelerate community content creation by using Kylin's information extraction to significantly increase the likelihood that a person visiting a Wikipedia article as a part of some other primary task will spontaneously choose to help improve the article's infobox, and (2) we show we accelerate information extraction by using contributions collected from people interacting with our designs to significantly improve Kylin's extraction performance. Copyright 2009 ACM. 0 0
Using Wikipedia to bootstrap open information extraction SIGMOD Rec. English 2009 0 0
Augmenting wikipedia-extraction with results from the web AAAI Workshop - Technical Report English 2008 Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper explains and evaluates a method for improving recall by extracting from the broader Web. There are two key advances necessary to make Web supplementation effective: 1) a method to filter promising sentences from Web pages, and 2) a novel retraining technique to broaden extractor recall. Experiments show that, used in concert with shrinkage, our techniques increase recall by a factor of up to 8 while maintaining or increasing precision. Copyright 0 0
Automatically Refining the Wikipedia Infobox Ontology Semantic web
Ontology
Wikipedia
Markov Logic Networks
17th International World Wide Web Conference (www-08) 2008 The combined efforts of human volunteers have recently extracted numerous facts fromWikipedia, storing them asmachine-harvestable object-attribute-value triples inWikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanly-structured ontology. This paper introduces KOG, an autonomous system for refining Wikipedia’s infobox-class ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs and a more powerful joint-inference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the joint-inference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integratingWikipedia’s infobox-class schemata with WordNet. We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features. 0 0
Automatically refining the Wikipedia infobox ontology English 2008 The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanly-structured ontology. This paper introduces KOG, an autonomous system for refining Wikipedia's infobox-class ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs and a more powerful joint-inference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the joint-inference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integrating Wikipedia's infobox-class schemata with WordNet. We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features. 0 0
Information extraction from Wikipedia: Moving down the long tail Information extraction
Semantic web
Wikipedia
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining English 2008 Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia's long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision. 0 0
Information extraction from Wikipedia: moving down the long tail English 2008 Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes ), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia's long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision. 0 0
Intelligence in wikipedia AAAI English 2008 0 0
Autonomously semantifying Wikipedia English 2007 Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describea prototype implementation of a self-supervised, machine learning system which realizes our vision. Preliminary experiments demonstrate the high precision of our system's extracted data - in one case equaling that of humans. 0 1