Link analysis

From WikiPapers
Jump to: navigation, search

Link analysis is included as keyword or extra keyword in 0 datasets, 0 tools and 24 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
MIGSOM: A SOM algorithm for large scale hyperlinked documents inspired by neuronal migration Kotaro Nakayama
Yutaka Matsuo
Lecture Notes in Computer Science English 2014 The SOM (Self Organizing Map), one of the most popular unsupervised machine learning algorithms, maps high-dimensional vectors into low-dimensional data (usually a 2-dimensional map). The SOM is widely known as a "scalable" algorithm because of its capability to handle large numbers of records. However, it is effective only when the vectors are small and dense. Although a number of studies on making the SOM scalable have been conducted, technical issues on scalability and performance for sparse high-dimensional data such as hyperlinked documents still remain. In this paper, we introduce MIGSOM, an SOM algorithm inspired by new discovery on neuronal migration. The two major advantages of MIGSOM are its scalability for sparse high-dimensional data and its clustering visualization functionality. In this paper, we describe the algorithm and implementation in detail, and show the practicality of the algorithm in several experiments. We applied MIGSOM to not only experimental data sets but also a large scale real data set: Wikipedia's hyperlink data. 0 0
A generalized flow-based method for analysis of implicit relationships on wikipedia Xiaodan Zhang
Yasuhito Asano
Masatoshi Yoshikawa
IEEE Transactions on Knowledge and Data Engineering English 2013 We focus on measuring relationships between pairs of objects in Wikipedia whose pages can be regarded as individual objects. Two kinds of relationships between two objects exist: in Wikipedia, an explicit relationship is represented by a single link between the two pages for the objects, and an implicit relationship is represented by a link structure containing the two pages. Some of the previously proposed methods for measuring relationships are cohesion-based methods, which underestimate objects having high degrees, although such objects could be important in constituting relationships in Wikipedia. The other methods are inadequate for measuring implicit relationships because they use only one or two of the following three important factors: distance, connectivity, and cocitation. We propose a new method using a generalized maximum flow which reflects all the three factors and does not underestimate objects having high degree. We confirm through experiments that our method can measure the strength of a relationship more appropriately than these previously proposed methods do. Another remarkable aspect of our method is mining elucidatory objects, that is, objects constituting a relationship. We explain that mining elucidatory objects would open a novel way to deeply understand a relationship. 0 0
How do metrics of link analysis correlate to quality, relevance and popularity in Wikipedia? Hanada R.T.S.
Marco Cristo
Pimentel M.D.G.C.
WebMedia 2013 - Proceedings of the 19th Brazilian Symposium on Multimedia and the Web English 2013 Many links between Web pages can be viewed as indicative of the quality and importance of the pages they pointed to. Accordingly, several studies have proposed metrics based on links to infer web page content quality. However, as far as we know, the only work that has examined the correlation between such metrics and content quality consisted of a limited study that left many open questions. In spite of these metrics having been shown successful in the task of ranking pages which were provided as answers to queries submitted to search engines, it is not possible to determine the specific contribution of factors such as quality, popularity, and importance to the results. This difficulty is partially due to the fact that such information is hard to obtain for Web pages in general. Unlike ordinary Web pages, the quality, importance and popularity of Wikipedia articles are evaluated by human experts or might be easily estimated. Thus, it is feasible to verify the relation between link analysis metrics and such factors in Wikipedia articles, our goal in this work. To accomplish that, we implemented several link analysis algorithms and compared their resulting rankings with the ones created by human evaluators regarding factors such as quality, popularity and importance. We found that the metrics are more correlated to quality and popularity than to importance, and the correlation is moderate. 0 0
Mutual Evaluation of Editors and Texts for Assessing Quality of Wikipedia Articles Yu Suzuki
Masatoshi Yoshikawa
WikiSym English August 2012 In this paper, we propose a method to identify good quality Wikipedia articles by mutually evaluating editors and texts. A major approach for assessing article quality is a text survival ratio based approach. In this approach, when a text survives beyond multiple edits, the text is assessed as good quality. This approach assumes that poor quality texts are deleted by editors with high possibility. However, many vandals delete good quality texts frequently, then the survival ratios of good quality texts are improperly decreased by vandals. As a result, many good quality texts are unfairly assessed as poor quality. In our method, we consider editor quality for calculating text quality, and decrease the impacts on text qualities by the vandals who has low quality. Using this improvement, the accuracy of the text quality should be improved. However, an inherent problem of this idea is that the editor qualities are calculated by the text qualities. To solve this problem, we mutually calculate the editor and text qualities until they converge. We did our experimental evaluation, and we confirmed that the proposed method could accurately assess the text qualities. 0 0
A technique for suggesting related Wikipedia articles using link analysis Markson C.
Song M.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2012 With more than 3.7 million articles, Wikipedia has become an important social medium for sharing knowledge. However, with this enormous repository of information, it can often be difficult to locate fundamental topics that support lower-level articles. By exploiting the information stored in the links between articles, we propose that related companion articles can be automatically generated to help further the reader's understanding of a given topic. This approach to a recommendation system uses tested link analysis techniques to present users with a clear path to related high-level articles, furthering the understanding of low-level topics. 0 0
Evaluating reranking methods based onlink co-occurrence and category in Wikipedia Takiguchi Y.
Kurakado K.
Oishi T.
Koshimura M.
Fujita H.
Hasegawa R.
ICAART 2012 - Proceedings of the 4th International Conference on Agents and Artificial Intelligence English 2012 We often use search engines in order to find appropriate documents on the Web. However, it is often the case that we cannot find desired information easily by giving a single query. In this paper, we present a method to extract related words for the query by using the various features of Wikipedia and rank learning. We aim at developing a system to assist the user in retrieving Web pages by reranking search results. 0 0
Mutual evaluation of editors and texts for assessing quality of Wikipedia articles Yu Suzuki
Masatoshi Yoshikawa
WikiSym 2012 English 2012 In this paper, we propose a method to identify good quality Wikipedia articles by mutually evaluating editors and texts. A major approach for assessing article quality is a text survival ratio based approach. In this approach, when a text survives beyond multiple edits, the text is assessed as good quality. This approach assumes that poor quality texts are deleted by editors with high possibility. However, many vandals delete good quality texts frequently, then the survival ratios of good quality texts are improperly decreased by vandals. As a result, many good quality texts are unfairly assessed as poor quality. In our method, we consider editor quality for calculating text quality, and decrease the impacts on text qualities by the vandals who has low quality. Using this improvement, the accuracy of the text quality should be improved. However, an inherent problem of this idea is that the editor qualities are calculated by the text qualities. To solve this problem, we mutually calculate the editor and text qualities until they converge. We did our experimental evaluation, and we confirmed that the proposed method could accurately assess the text qualities. 0 0
QualityRank: Assessing quality of wikipedia articles by mutually evaluating editors and texts Yu Suzuki
Masatoshi Yoshikawa
HT'12 - Proceedings of 23rd ACM Conference on Hypertext and Social Media English 2012 In this paper, we propose a method to identify high-quality Wikipedia articles by mutually evaluating editors and texts. A major approach for assessing articles using edit history is a text survival ratio based approach. However, the problem is that many high-quality articles are identified as low quality, because many vandals delete high-quality texts, then the survival ratios of high-quality texts are decreased by vandals. Our approach's strongest point is its resistance to vandalism. Using our method, if we calculate text quality values using editor quality values, vandals do not affect any quality values of the other editors, then the accuracy of text quality values should improve. However, the problem is that editor quality values are calculated by text quality values, and text quality values are calculated by editor quality values. To solve this problem, we mutually calculate editor and text quality values until they converge. Using this method, we can calculate a quality value of a text that takes into consideration that of its editors. From experimental evaluation, we confirmed that the proposed method can improve the accuracy of quality values for articles. Copyright 2012 ACM. 0 0
Topic crawling strategy based on Wikipedia and analysis of pages' similarity Xuan Zhao
LeBo Liu
Dingquan Wang
Zheng J.
Applied Mechanics and Materials English 2012 Considering the weaknesses existing in the present topic crawling strategies, this paper puts forward a new method which is based on Wikipedia and the analysis of page similarity. Firstly, the topic is described via Wikipedia. Then, handle the downloaded web. Finally, calculate the priorities of the links through text relativity and analysis of the web links. The result indicates that this new method is better than the traditional in terms of searching results and topic relativity and is worth popularizing. 0 0
A self organizing document map algorithm for large scale hyperlinked data inspired by neuronal migration Kotaro Nakayama
Yutaka Matsuo
Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 English 2011 Web document clustering is one of the research topics that is being pursued continuously due to the large variety of applications. Since Web documents usually have variety and diversity in terms of domains, content and quality, one of the technical difficulties is to find a reasonable number and size of clusters. In this research, we pay attention to SOMs (Self Organizing Maps) because of their capability of visualized clustering that helps users to investigate characteristics of data in detail. The SOM is widely known as a "scalable" algorithm because of its capability to handle large numbers of records. However, it is effective only when the vectors are small and dense. Although several research efforts on making the SOM scalable have been conducted, technical issues on scalability and performance for sparse high-dimensional data such as hyperlinked documents still remain. In this paper, we introduce MIGSOM, an SOM algorithm inspired by a recent discovery on neuronal migration. The two major advantages of MIGSOM are its scalability for sparse high-dimensional data and its clustering visualization functionality. In this paper, we describe the algorithm and implementation, and show the practicality of the algorithm by applying MIGSOM to a huge scale real data set: Wikipedia's hyperlink data. 0 0
Evaluating reranking methods using wikipedia features Kurakado K.
Oishi T.
Hasegawa R.
Fujita H.
Koshimura M.
ICAART 2011 - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence English 2011 Many people these days access a vast document on the Web very often with the help of search engines such as Google. However, even if we use the search engine, it is often the case that we cannot find desired information easily. In this paper, we extract related words for the search query by analyzing link information and category structure. we aim to assist the user in retrieving web pages by reranking search results. 0 0
A classification algorithm of signed networks based on link analysis Qu Z.
Yafang Wang
Wang J.
Zhang F.
Qin Z.
2010 International Conference on Communications, Circuits and Systems, ICCCAS 2010 - Proceedings English 2010 In the signed networks the links between nodes can be either positive (means relations are friendship) or negative (means relations are rivalry or confrontation), which are very useful for analysis the real social network. After study data sets from Wikipedia and Slashdot networks, We find that the signs of links in the fundamental social networks can be used to classified the nodes and used to forecast the potential emerged sign of links in the future with high accuracy, using models that established across these diverse data sets. Based on the models, the proposed algorithm in the artwork provides perception into some of the underlying principles that extract from signed links in the networks. At the same time, the algorithm shed light on the social computing applications by which the attitude of a person toward another can be predicted from evidence provided by their around friends relationships. 0 0
Analysis of implicit relations on wikipedia: Measuring strength through mining elucidatory objects Xiaodan Zhang
Yasuhito Asano
Masatoshi Yoshikawa
Lecture Notes in Computer Science English 2010 We focus on measuring relations between pairs of objects in Wikipedia whose pages can be regarded as individual objects. Two kinds of relations between two objects exist: in Wikipedia, an explicit relation is represented by a single link between the two pages for the objects, and an implicit relation is represented by a link structure containing the two pages. Previously proposed methods are inadequate for measuring implicit relations because they use only one or two of the following three important factors: distance, connectivity, and co-citation. We propose a new method reflecting all the three factors by using a generalized maximum flow. We confirm that our method can measure the strength of a relation more appropriately than these previously proposed methods do. Another remarkable aspect of our method is mining elucidatory objects, that is, objects constituting a relation. We explain that mining elucidatory objects opens a novel way to deeply understand a relation. 0 0
Analysis of implicit relations on wikipedia: measuring strength through mining elucidatory objects Xinpeng Zhang
Yasuhito Asano
Masatoshi Yoshikawa
DASFAA English 2010 0 0
Mining and explaining relationships in Wikipedia Xiaodan Zhang
Yasuhito Asano
Masatoshi Yoshikawa
Lecture Notes in Computer Science English 2010 Mining and explaining relationships between objects are challenging tasks in the field of knowledge search. We propose a new approach for the tasks using disjoint paths formed by links in Wikipedia. To realizing this approach, we propose a naive and a generalized flow based method, and a technique of avoiding flow confluences for forcing a generalized flow to be disjoint as possible. We also apply the approach to classification of relationships. Our experiments reveal that the generalized flow based method can mine many disjoint paths important for a relationship, and the classification is effective for explaining relationships. 0 0
Mining and explaining relationships in wikipedia Xinpeng Zhang
Yasuhito Asano
Masatoshi Yoshikawa
DEXA English 2010 0 0
Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia Daniel H. Dalip
Marcos A. Gonçalves
Marco Cristo
Pável Calado
English 2009 The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction. 0 3
Improving the extraction of bilingual terminology from Wikipedia Maike Erdmann
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
ACM Trans. Multimedia Comput. Commun. Appl. English 2009 Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an {SVM} classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs. 2009 {ACM. 0 0
A model for Ranking entities and its application to Wikipedia Gianluca Demartini
Firan C.S.
Tereza Iofciu
Ralf Krestel
Wolfgang Nejdl
Proceedings of the Latin American Web Conference, LA-WEB 2008 English 2008 Entity Ranking (ER) is a recently emerging search task in Information Retrieval, where the goal is not finding documents matching the query words, but instead finding entities which match types and attributes mentioned in the query. In this paper we propose a formal model to define entities as well as a complete ER system, providing examples of its application to enterprise, Web, and Wikipedia scenarios. Since searching for entities on Web scale repositories is an open challenge as the effectiveness of ranking is usually not satisfactory, we present a set of algorithms based on our model and evaluate their retrieval effectiveness. The results show that combining simple Link Analysis, Natural Language Processing, and Named Entity Recognition methods improves retrieval performance of entity search by over 53% for P@ 10 and 35% for MAP. 0 0
Automatic Wikibook prototyping Chou J.-L.
Wu S.-H.
Proceedings - ICCE 2008: 16th International Conference on Computers in Education English 2008 Wikipedia is the world's largest collaboratively edited source of encyclopedic knowledge. Wikibook is a sub-project of Wikipedia. The purpose of Wikibook is to enable a free textbook to be edited by various contributors, in the same way that Wikipedia is composed and edited. However, editing a book requires more effort than editing separate articles. Therefore, how to help users cooperatively edit a book is a new research issue. In this paper, we investigate how to automatically extract content from Wikipedia and generate a prototype of a Wikibook. Applying search technology, our system can retrieve relevant articles from Wikipedia. A table of contents is built automatically based on link analysis and. Our experiment shows that given a topic, our system can generate a table of contents, which can be treated as a prototype of a Wikibook. 0 0
Overview of the TREC 2008 enterprise track Balog K.
Soboroff I.
Thomas P.
Bailey P.
Nick Craswell
De Vries A.P.
NIST Special Publication English 2008 The fourth year of the enterprise track has featured the same tasks and collection as in the 2007 edition: document and expert search on the CERC corpus. Topics have been extracted from a log of real email enquiries. The only difference compared to the previous year is that both tasks were judged by participants. Although disagreements between assessors do exist, these do not have a large effect on the rankings of systems for either of the tasks. Common themes for this year's document search task included query expansion using ex- ternal sources (He et al., 2008; Peng and Mao, 2008), exploiting expertise profiles (Balog and de Rijke, 2008; Cummins and O'Riordan, 2008), and leveraging link-structure in the form of in-degree (Zhu, 2008), out-degree (Wu et al., 2008), or PageRank (Xue et al., 2008; Nemirovsky and Avrachenkov, 2008). The best performing document search run employed a query perfor- mance predictor mechanism to selectively apply collection enrichment (i.e., query expansion) based on Wikipedia on a per-query basis; retrieval was performed using the Divergence From Randomness framework (He et al., 2008). As to expert search, methods and approaches employed this year included special treatment of different types of person occurrences (Shen et al., 2008; Yao et al., 2008; Jiang et al., 2008), link analysis (Xue et al., 2008; Zhu, 2008), proximity-based techniques (Balog and de Rijke, 2008; He et al., 2008; Zhu, 2008), the use of external evidence (Balog and de Rijke, 2008; He et al., 2008; Serdyukov et al., 2008), and the combination of candidate- and document-based methods (Balog and de Rijke, 2008; Xue et al., 2008). The best performing expert search run used a Language Modeling framework to combine three models: a proximity-based candidate model, a document-based model, and a Web-based variation of the candidate model (Balog and de Rijke, 2008). The Enterprise Track was introduced in 2005, and after four successful years, it came to an end in 2008. Since its introduction, the track, and especially the expert finding task, has generated a lot of interest within the research community, with rapid progress being made in terms of algorithms, modeling, and evaluation. Table 6 lists the tasks featured at the Enterprise track throughout the years. The Entity Search Track, implemented at TREC 2009 can be seen as a continuation of the expert search task, extending it along two dimensions: type (from people- only to multiple types of entities) and scale (from Intranet to Web). 0 0
Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation Denis Turdakov
Pavel Velikhov
CEUR Workshop Proceedings English 2008 Wikipedia has grown into a high quality up-todate knowledge base and can enable many knowledge-based applications, which rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently produce a list of ranked similar concepts for a given concept is very important for a wide range of applications. We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice's measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation. Our approach to word sense disambiguation is based solely on the similarity measure and produces results with high accuracy. 0 1
Semantically enhanced entity ranking Gianluca Demartini
Firan C.S.
Tereza Iofciu
Wolfgang Nejdl
Lecture Notes in Computer Science English 2008 Users often want to find entities instead of just documents, i.e., finding documents entirely about specific real-world entities rather than general documents where the entities are merely mentioned. Searching for entities on Web scale repositories is still an open challenge as the effectiveness of ranking is usually not satisfactory. Semantics can be used in this context to improve the results leveraging on entity-driven ontologies. In this paper we propose three categories of algorithms for query adaptation, using (1) semantic information, (2) NLP techniques, and (3) link structure, to rank entities in Wikipedia. Our approaches focus on constructing queries using not only keywords but also additional syntactic information, while semantically relaxing the query relying on a highly accurate ontology. The results show that our approaches perform effectively, and that the combination of simple NLP, Link Analysis and semantic techniques improves the retrieval performance of entity search. 0 0
Quantifying the accuracy of relational statements in Wikipedia: A methodology Gabriel Weaver
Barbara Strickland
Gregory Crane
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2006 An initial evaluation of the English Wikipedia indicates that it may provide accurate data for disambiguating and finding relations among named entities. 0 0