Browse wiki

Jump to: navigation, search
Completing Wikipedia's hyperlink structure through dimensionality reduction
Abstract Wikipedia is the largest monolithic reposiWikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyperlinks. However, since these links are created by human authors, links one would expect to see are often missing. The goal of this work is to detect such gaps automatically. In this paper, we propose a novel method for augmenting the structure of hyperlinked document collections such as Wikipedia. It does not require the extraction of any manually defined features from the article to be augmented. Instead, it is based on principal component analysis, a well-founded mathematical generalization technique, and predicts new links purely based on the statistical structure of the graph formed by the existing links. Our method does not rely on the textual content of articles; we are exploiting only hyperlinks. A user evaluation of our technique shows that it improves the quality of top link suggestions over the state of the art and that the best predicted links are significantly more valuable than the 'average' link already present in Wikipedia. Beyond link prediction, our algorithm can potentially be used to point out topics an article misses to cover and to cluster articles semantically. Copyright 2009 ACM.articles semantically. Copyright 2009 ACM.
Abstractsub Wikipedia is the largest monolithic reposiWikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyperlinks. However, since these links are created by human authors, links one would expect to see are often missing. The goal of this work is to detect such gaps automatically. In this paper, we propose a novel method for augmenting the structure of hyperlinked document collections such as Wikipedia. It does not require the extraction of any manually defined features from the article to be augmented. Instead, it is based on principal component analysis, a well-founded mathematical generalization technique, and predicts new links purely based on the statistical structure of the graph formed by the existing links. Our method does not rely on the textual content of articles; we are exploiting only hyperlinks. A user evaluation of our technique shows that it improves the quality of top link suggestions over the state of the art and that the best predicted links are significantly more valuable than the 'average' link already present in Wikipedia. Beyond link prediction, our algorithm can potentially be used to point out topics an article misses to cover and to cluster articles semantically. Copyright 2009 ACM.articles semantically. Copyright 2009 ACM.
Bibtextype inproceedings  +
Doi 10.1145/1645953.1646093  +
Has author Robert West + , Doina Precup + , Joelle Pineau +
Has extra keyword Dimensionality reduction + , Document collection + , Graph mining + , Human knowledge + , Hyperlink structure + , Hyperlinks + , Link mining + , Link prediction + , Novel methods + , Sheer size + , State of the art + , Statistical structures + , Textual content + , User evaluations + , Wikipedia + , Feature extraction + , Hypertext systems + , Knowledge management + , Principal component analysis +
Has keyword Data mining + , Graph mining + , Link mining + , Principal component analysis + , Wikipedia +
Isbn 9781605585123  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 1097–1106  +
Published in International Conference on Information and Knowledge Management, Proceedings +
Title Completing Wikipedia's hyperlink structure through dimensionality reduction +
Type conference paper  +
Year 2009 +
Creation dateThis property is a special property in this wiki. 7 November 2014 13:17:51  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 13:17:51  +
DateThis property is a special property in this wiki. 2009  +
hide properties that link here 
Completing Wikipedia's hyperlink structure through dimensionality reduction + Title
 

 

Enter the name of the page to start browsing from.