Browse wiki

Jump to: navigation, search
Network Analysis for Wikipedia
Abstract Network analysis is concerned with propertNetwork analysis is concerned with properties related to connectivity and distances in graphs, with diverse applications like citation indexing and information retrieval on the Web. HITS (Hyperlink-Induced Topic Search) is a network analysis algorithm that has been successfully used for ranking web pages related to a common topic according to their potential relevance. HITS is based on the notions of hub and authority: a good hub is a page that points to several good authorities; a good authority is a page that is pointed at by several good hubs. HITS exclusively relies on the hyperlink relations existing among the pages, to define the two mutually reinforcing measures of hub and authority. It can be proved that for each page these two weights converge to fixed points, the actual hub and authority values for the page. Authority is used to rank pages resulting from a given query (and thus potentially related to a given topic) in order of relevance. The hyperlinked structure of Wikipedia and the ongoing, incremental editing process behind it make it an interesting and unexplored target domain for network analysis techniques. In particular, we explored the relevance of the notion of HITS's authority on this encyclopedic corpus. We've developed a crawler that extensively scans through the structure of English language Wikipedia articles, and that keeps track for each entry of all other Wikipedia articles pointed at in its definition. The result is a directed graph (roughly 500000 nodes, and more than 8 millions links), which consists for the most part of a big loosely connected component. Then we applied the HITS algorithm to the latter, thus getting a hub and authority weight associated to every entry. First results seem to be meaningful in characterizing the notion of authority in this peculiar domain. Highest-rank authorities seem to be for the most part lexical elements that denote particular and concrete rather than universal and abstract entities. More precisely, at the very top of the authority scale there are concepts used to structure space and time like country names, city names and other geopolitical entities (such as United States and many European countries), historical periods and landmark events (World War II, 1960s). "Television", "scientifc classification" and "animal" are the first three most authoritative common nouns. We will also present the first results issued from the application of well-known PageRank algorithm (Google's popular ranking metrics detailed in 2) to the Wikipedia entries collected by our crawler.ikipedia entries collected by our crawler.
Abstractsub Network analysis is concerned with propertNetwork analysis is concerned with properties related to connectivity and distances in graphs, with diverse applications like citation indexing and information retrieval on the Web. HITS (Hyperlink-Induced Topic Search) is a network analysis algorithm that has been successfully used for ranking web pages related to a common topic according to their potential relevance. HITS is based on the notions of hub and authority: a good hub is a page that points to several good authorities; a good authority is a page that is pointed at by several good hubs. HITS exclusively relies on the hyperlink relations existing among the pages, to define the two mutually reinforcing measures of hub and authority. It can be proved that for each page these two weights converge to fixed points, the actual hub and authority values for the page. Authority is used to rank pages resulting from a given query (and thus potentially related to a given topic) in order of relevance. The hyperlinked structure of Wikipedia and the ongoing, incremental editing process behind it make it an interesting and unexplored target domain for network analysis techniques. In particular, we explored the relevance of the notion of HITS's authority on this encyclopedic corpus. We've developed a crawler that extensively scans through the structure of English language Wikipedia articles, and that keeps track for each entry of all other Wikipedia articles pointed at in its definition. The result is a directed graph (roughly 500000 nodes, and more than 8 millions links), which consists for the most part of a big loosely connected component. Then we applied the HITS algorithm to the latter, thus getting a hub and authority weight associated to every entry. First results seem to be meaningful in characterizing the notion of authority in this peculiar domain. Highest-rank authorities seem to be for the most part lexical elements that denote particular and concrete rather than universal and abstract entities. More precisely, at the very top of the authority scale there are concepts used to structure space and time like country names, city names and other geopolitical entities (such as United States and many European countries), historical periods and landmark events (World War II, 1960s). "Television", "scientifc classification" and "animal" are the first three most authoritative common nouns. We will also present the first results issued from the application of well-known PageRank algorithm (Google's popular ranking metrics detailed in 2) to the Wikipedia entries collected by our crawler.ikipedia entries collected by our crawler.
Bibtextype inproceedings  +
Has author Bellomi + , Francesco + , Roberto Bonato +
Has keyword Link-mining + , Wikipedia +
Has remote mirror http://www.fran.it/articles/wikimania_bellomi_bonato.pdf  +
Number of citations by publication 3  +
Number of references by publication 0  +
Published in Proceedings of Wikimania 2005, Frankfurt, Germany. +
Title Network Analysis for Wikipedia +
Type conference paper  +
Year 2005 +
Creation dateThis property is a special property in this wiki. 20 September 2014 16:57:20  +
Categories Publications without language parameter  + , Publications without license parameter  + , Publications without DOI parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 20 September 2014 16:57:20  +
DateThis property is a special property in this wiki. 2005  +
hide properties that link here 
Network Analysis for Wikipedia + Title
 

 

Enter the name of the page to start browsing from.