From WikiPapers
Jump to: navigation, search

disambiguation is included as keyword or extra keyword in 0 datasets, 0 tools and 11 publications.


There is no datasets for this keyword.


There is no tools for this keyword.


Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
An automatic sameAs link discovery from Wikipedia Kagawa K.
Susumu Tamagawa
Takahira Yamaguchi
Lecture Notes in Computer Science English 2014 Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover "sameAs" and "meaningOf" links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70% precision/ recall rate. 0 0
Wikimantic: Toward effective disambiguation and expansion of queries Boston C.
Fang H.
Carberry S.
Wu H.
Xiaojiang Liu
Data and Knowledge Engineering English 2014 This paper presents an implemented and evaluated methodology for disambiguating terms in search queries and for augmenting queries with expansion terms. By exploiting Wikipedia articles and their reference relations, our method is able to disambiguate terms in particularly short queries with few context words and to effectively expand queries for retrieval of short documents such as tweets. Our strategy can determine when a sequence of words should be treated as a single entity rather than as a sequence of individual entities. This work is part of a larger project to retrieve information graphics in response to user queries. © 2013 Elsevier B.V. 0 0
An open-source toolkit for mining Wikipedia Milne D.
Witten I.H.
Artificial Intelligence English 2013 The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia's rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia's content and structure, and includes a Java API to provide access to them. Wikipedia's articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques. © 2012 Elsevier B.V. All rights reserved. 0 1
Evaluating entity linking with wikipedia Ben Hachey
Will Radford
Joel Nothman
Matthew Honnibal
Curran J.R.
Artificial Intelligence English 2013 Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate entities and then disambiguate them, returning either the best candidate or nil. However, comparison has focused on disambiguation accuracy, making it difficult to determine how search impacts performance. Furthermore, important approaches from the literature have not been systematically compared on standard data sets. We reimplement three seminal nel systems and present a detailed evaluation of search strategies. Our experiments find that coreference and acronym handling lead to substantial improvement, and search strategies account for much of the variation between systems. This is an interesting finding, because these aspects of the problem have often been neglected in the literature, which has focused largely on complex candidate ranking algorithms. © 2012 Elsevier B.V. All rights reserved. 0 0
News auto-tagging using Wikipedia Eldin S.S.
El-Beltagy S.R.
2013 9th International Conference on Innovations in Information Technology, IIT 2013 English 2013 This paper presents an efficient method for automatically annotating Arabic news stories with tags using Wikipedia. The idea of the system is to use Wikipedia article names, properties, and re-directs to build a pool of meaningful tags. Sophisticated and efficient matching methods are then used to detect text fragments in input news stories that correspond to entries in the constructed tag pool. Generated tags represent real life entities or concepts such as the names of popular places, known organizations, celebrities, etc. These tags can be used indirectly by a news site for indexing, clustering, classification, statistics generation or directly to give a news reader an overview of news story contents. Evaluation of the system has shown that the tags it generates are better than those generated by MSN Arabic news. 0 0
Wikimantic: Disambiguation for short queries Boston C.
Carberry S.
Fang H.
Lecture Notes in Computer Science English 2012 This paper presents an implemented and evaluated methodology for disambiguating terms in search queries. By exploiting Wikipedia articles and their reference relations, our method is able to disambiguate terms in particularly short queries with few context words. This work is part of a larger project to retrieve information graphics in response to user queries. 0 0
Entity disambiguation with hierarchical topic models Kataria S.S.
Kumar K.S.
Rastogi R.
Sen P.
Sengamedu S.H.
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining English 2011 Disambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical variants, form a natural class of models for learning accurate entity disambiguation models from crowd-sourced knowledge bases such as Wikipedia. Our main contribution is a semi-supervised hierarchical model called Wikipedia-based Pachinko Allocation Model (WPAM) that exploits: (1) All words in the Wikipedia corpus to learn word-entity associations (unlike existing approaches that only use words in a small fixed window around annotated entity references in Wikipedia pages), (2) Wikipedia annotations to appropriately bias the assignment of entity labels to annotated (and co-occurring unannotated) words during model learning, and (3) Wikipedia's category hierarchy to capture co-occurrence patterns among entities. We also propose a scheme for pruning spurious nodes from Wikipedia's crowd-sourced category hierarchy. In our experiments with multiple real-life datasets, we show that WPAM outperforms state-of-the-art baselines by as much as 16% in terms of disambiguation accuracy. Copyright 2011 ACM. 0 0
A ranking approach to target detection for automatic link generation He J.
Maarten de Rijke
SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval English 2010 We focus on the task of target detection in automatic link generation with Wikipedia, i.e., given an N-gram in a snippet of text, find the relevant Wikipedia concepts that explain or provide background knowledge for it. We formulate the task as a ranking problem and investigate the effectiveness of learning to rank approaches and of the features that we use to rank the target concepts for a given N-gram. Our experiments show that learning to rank approaches outperform traditional binary classification approaches. Also, our proposed features are effective both in binary classification and learning to rank settings. 0 0
Using co-occurrence models for placename disambiguation Simon Overell
Stefan Ruger
International Journal of Geographical Information Science English 2008 This paper describes the generation of a model capturing information on how placenames co-occur together. The advantages of the co-occurrence model over traditional gazetteers are discussed and the problem of placename disambiguation is presented as a case study. We begin by outlining the problem of ambiguous placenames. We demonstrate how analysis of Wikipedia can be used in the generation of a co-occurrence model. The accuracy of our model is compared to a handcrafted ground truth; then we evaluate alternative methods of applying this model to the disambiguation of placenames in free text (using the GeoCLEF evaluation forum). We conclude by showing how the inclusion of placenames in both the text and geographic parts of a query provides the maximum mean average precision and outline the benefits of a co-occurrence model as a data source for the wider field of geographic information retrieval (GIR). 0 0
Geographic co-occurrence as a tool for GIR. Overell
Simon E.
Stefan Ruger
4th ACM workshop on Geographical Information Retrieval. Lisbon, Portugal. 2007 In this paper we describe the development of a geographic co-occurrence model and how it can be applied to geographic information retrieval. The model consists of mining co-occurrences of placenames from Wikipedia, and then mapping these placenames to locations in the Getty Thesaurus of Geographical Names. We begin by quantifying the accuracy of our model and compute theoretical bounds for the accuracy achievable when applied to placename disambiguation in free text. We conclude with a discussion of the improvement such a model could provide for placename disambiguation and geographic relevance ranking over traditional methods. 0 0
Measuring Wikipedia Jakob Voss International Conference of the International Society for Scientometrics and Informetrics English 2005 Wikipedia, an international project that uses Wiki software to collaboratively create an encyclopaedia, is becoming more and more popular. Everyone can directly edit articles and every edit is recorded. The version history of all articles is freely available and allows a multitude of examinations. This paper gives an overview on Wikipedia research. Wikipedia's fundamental components, i.e. articles, authors, edits, and links, as well as content and quality are analysed. Possibilities of research are explored including examples and first results. Several characteristics that are found in Wikipedia, such as exponential growth and scale-free networks are already known in other context. However the Wiki architecture also possesses some intrinsic specialties. General trends are measured that are typical for all Wikipedias but vary between languages in detail. 12 16