Semantic relatedness

From WikiPapers
Jump to: navigation, search

Semantic relatedness is included as keyword or extra keyword in 0 datasets, 0 tools and 85 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Augmenting concept definition in gloss vector semantic relatedness measure using wikipedia articles Pesaranghader A.
Rezaei A.
Lecture Notes in Electrical Engineering English 2014 Semantic relatedness measures are widely used in text mining and information retrieval applications. Considering these automated measures, in this research paper we attempt to improve Gloss Vector relatedness measure for more accurate estimation of relatedness between two given concepts. Generally, this measure, by constructing concepts definitions (Glosses) from a thesaurus, tries to find the angle between the concepts' gloss vectors for the calculation of relatedness. Nonetheless, this definition construction task is challenging as thesauruses do not provide full coverage of expressive definitions for the particularly specialized concepts. By employing Wikipedia articles and other external resources, we aim at augmenting these concepts' definitions. Applying both definition types to the biomedical domain, using MEDLINE as corpus, UMLS as the default thesaurus, and a reference standard of 68 concept pairs manually rated for relatedness, we show exploiting available resources on the Web would have positive impact on final measurement of semantic relatedness. 0 0
Exploiting Wikipedia for Evaluating Semantic Relatedness Mechanisms Ferrara F.
Tasso C.
Communications in Computer and Information Science English 2014 The semantic relatedness between two concepts is a measure that quantifies the extent to which two concepts are semantically related. In the area of digital libraries, several mechanisms based on semantic relatedness methods have been proposed. Visualization interfaces, information extraction mechanisms, and classification approaches are just some examples of mechanisms where semantic relatedness methods can play a significant role and were successfully integrated. Due to the growing interest of researchers in areas like Digital Libraries, Semantic Web, Information Retrieval, and NLP, various approaches have been proposed for automatically computing the semantic relatedness. However, despite the growing number of proposed approaches, there are still significant criticalities in evaluating the results returned by different methods. The limitations evaluation mechanisms prevent an effective evaluation and several works in the literature emphasize that the exploited approaches are rather inconsistent. In order to overcome this limitation, we propose a new evaluation methodology where people provide feedback about the semantic relatedness between concepts explicitly defined in digital encyclopedias. In this paper, we specifically exploit Wikipedia for generating a reliable dataset. 0 0
Graph-based domain-specific semantic relatedness from Wikipedia Sajadi A. Lecture Notes in Computer Science English 2014 Human made ontologies and lexicons are promising resources for many text mining tasks in domain specific applications, but they do not exist for most domains. We study the suitability of Wikipedia as an alternative resource for ontologies regarding the Semantic Relatedness problem. We focus on the biomedical domain because (1) high quality manually curated ontologies are available and (2) successful graph based methods have been proposed for semantic relatedness in this domain. Because Wikipedia is not hierarchical and links do not convey defined semantic relationships, the same methods used on lexical resources (such as WordNet) cannot be applied here straightforwardly. Our contributions are (1) Demonstrating that Wikipedia based methods outperform state of the art ontology based methods on most of the existing ontologies in the biomedical domain (2) Adapting and evaluating the effectiveness of a group of bibliometric methods of various degrees of sophistication on Wikipedia for the first time (3) Proposing a new graph-based method that is outperforming existing methods by considering some specific features of Wikipedia structure. 0 0
Learning to compute semantic relatedness using knowledge from wikipedia Zheng C.
Zhe Wang
Bie R.
Zhou M.
Lecture Notes in Computer Science English 2014 Recently, Wikipedia has become a very important resource for computing semantic relatedness (SR) between entities. Several approaches have already been proposed to compute SR based on Wikipedia. Most of the existing approaches use certain kinds of information in Wikipedia (e.g. links, categories, and texts) and compute the SR by empirically designed measures. We have observed that these approaches produce very different results for the same entity pair in some cases. Therefore, how to select appropriate features and measures to best approximate the human judgment on SR becomes a challenging problem. In this paper, we propose a supervised learning approach for computing SR between entities based on Wikipedia. Given two entities, our approach first maps entities to articles in Wikipedia; then different kinds of features of the mapped articles are extracted from Wikipedia, which are then combined with different relatedness measures to produce nine raw SR values of the entity pair. A supervised learning algorithm is proposed to learn the optimal weights of different raw SR values. The final SR is computed as the weighted average of raw SRs. Experiments on benchmark datasets show that our approach outperforms baseline methods. 0 0
Self-sorting map: An efficient algorithm for presenting multimedia data in structured layouts Strong G.
Gong M.
IEEE Transactions on Multimedia English 2014 This paper presents the Self-Sorting Map (SSM), a novel algorithm for organizing and presenting multimedia data. Given a set of data items and a dissimilarity measure between each pair of them, the SSM places each item into a unique cell of a structured layout, where the most related items are placed together and the unrelated ones are spread apart. The algorithm integrates ideas from dimension reduction, sorting, and data clustering algorithms. Instead of solving the continuous optimization problem that other dimension reduction approaches do, the SSM transforms it into a discrete labeling problem. As a result, it can organize a set of data into a structured layout without overlap, providing a simple and intuitive presentation. The algorithm is designed for sorting all data items in parallel, making it possible to arrange millions of items in seconds. Experiments on different types of data demonstrate the SSM's versatility in a variety of applications, ranging from positioning city names by proximities to presenting images according to visual similarities, to visualizing semantic relatedness between Wikipedia articles. 0 0
An approach for deriving semantically related category hierarchies from Wikipedia category graphs Hejazy K.A.
El-Beltagy S.R.
Advances in Intelligent Systems and Computing English 2013 Wikipedia is the largest online encyclopedia known to date. Its rich content and semi-structured nature has made it into a very valuable research tool used for classification, information extraction, and semantic annotation, among others. Many applications can benefit from the presence of a topic hierarchy in Wikipedia. However, what Wikipedia currently offers is a category graph built through hierarchical category links the semantics of which are un-defined. Because of this lack of semantics, a sub-category in Wikipedia does not necessarily comply with the concept of a sub-category in a hierarchy. Instead, all it signifies is that there is some sort of relationship between the parent category and its sub-category. As a result, traversing the category links of any given category can often result in surprising results. For example, following the category of "Computing" down its sub-category links, the totally unrelated category of "Theology" appears. In this paper, we introduce a novel algorithm that through measuring the semantic relatedness between any given Wikipedia category and nodes in its sub-graph is capable of extracting a category hierarchy containing only nodes that are relevant to the parent category. The algorithm has been evaluated by comparing its output with a gold standard data set. The experimental setup and results are presented. 0 0
An open-source toolkit for mining Wikipedia Milne D.
Witten I.H.
Artificial Intelligence English 2013 The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia's rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia's content and structure, and includes a Java API to provide access to them. Wikipedia's articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques. © 2012 Elsevier B.V. All rights reserved. 0 1
Computing semantic relatedness from human navigational paths on wikipedia Singer P.
Niebler T.
Strohmaier M.
Hotho A.
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 This paper presents a novel approach for computing semantic relatedness between concepts on Wikipedia by using human navigational paths for this task. Our results suggest that human navigational paths provide a viable source for calculating semantic relatedness between concepts on Wikipedia. We also show that we can improve accuracy by intelligent selection of path corpora based on path characteristics indicating that not all paths are equally useful. Our work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task. 0 0
Computing semantic relatedness using Wikipedia features Hadj Taieb M.A.
Ben Aouicha M.
Ben Hamadou A.
Knowledge-Based Systems English 2013 Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguistics, cognitive science and artificial intelligence. In this paper, we propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances. Therefore, we utilized the Wikipedia features (articles, categories, Wikipedia category graph and redirection) in a system combining this Wikipedia semantic information in its different components. The approach is preceded by a pre-processing step to provide for each category pertaining to the Wikipedia category graph a semantic description vector including the weights of stems extracted from articles assigned to the target category. Next, for each candidate word, we collect its categories set using an algorithm for categories extraction from the Wikipedia category graph. Then, we compute the semantic relatedness degree using existing vector similarity metrics (Dice, Overlap and Cosine) and a new proposed metric that performed well as cosine formula. The basic system is followed by a set of modules in order to exploit Wikipedia features to quantify better as possible the semantic relatedness between words. We evaluate our measure based on two tasks: comparison with human judgments using five datasets and a specific application "solving choice problem". Our result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches. © 2013 Elsevier B.V. All rights reserved. 0 0
Computing semantic relatedness using word frequency and layout information of wikipedia Chan P.
Hijikata Y.
Nishida S.
Proceedings of the ACM Symposium on Applied Computing English 2013 Computing the semantic relatedness between two words or phrases is an important problem for fields such as information retrieval and natural language processing. One state-of-the-art approach to solve the problem is Explicit Semantic Analysis (ESA). ESA uses the word frequency in Wikipedia articles to estimate the relevance, so the relevance of words with low frequency cannot always be well estimated. To improve the relevance estimate of the low frequency words, we use not only word frequency but also layout information in Wikipedia articles. Empirical evaluation shows that on the low frequency words, our method achieves better estimate of semantic relatedness over ESA. Copyright 2013 ACM. 0 0
Extracting knowledge from Wikipedia articles through distributed semantic analysis Hieu N.T.
Di Francesco M.
Yla-Jaaski A.
ACM International Conference Proceeding Series English 2013 Computing semantic word similarity and relatedness requires access to vast amounts of semantic space for effective analysis. As a consequence, it is time-consuming to extract useful information from a large amount of data on a single workstation. In this paper, we propose a system, called Distributed Semantic Analysis (DSA), that integrates a distributed-based approach with semantic analysis. DSA builds a list of concept vectors associated with each word by exploiting the knowledge provided by Wikipedia articles. Based on such lists, DSA calculates the degree of semantic relatedness between two words through the cosine measure. The proposed solution is built on top of the Hadoop MapReduce framework and the Mahout machine learning library. Experimental results show two major improvements over the state of the art, with particular reference to the Explicit Semantic Analysis method. First, our distributed approach significantly reduces the computation time to build the concept vectors, thus enabling the use of larger inputs that is the basis for more accurate results. Second, DSA obtains a very high correlation of computed relatedness with reference benchmarks derived by human judgements. Moreover, its accuracy is higher than solutions reported in the literature over multiple benchmarks. 0 0
Measuring semantic relatedness using wikipedia signed network Yang W.-T.
Kao H.-Y.
Journal of Information Science and Engineering English 2013 Identifying the semantic relatedness of two words is an important task for the information retrieval, natural language processing, and text mining. However, due to the diversity of meaning for a word, the semantic relatedness of two words is still hard to precisely evaluate under the limited corpora. Nowadays, Wikipedia is now a huge and wiki-based encyclopedia on the internet that has become a valuable resource for research work. Wikipedia articles, written by a live collaboration of user editors, contain a high volume of reference links, URL identification for concepts and a complete revision history. Moreover, each Wikipedia article represents an individual concept that simultaneously contains other concepts that are hyperlinks of other articles embedded in its content. Through this, we believe that the semantic relatedness between two words can be found through the semantic relatedness between two Wikipedia articles. Therefore, we propose an Editor-Contribution-based Rank (ECR) algorithm for ranking the concepts in the article's content through all revisions and take the ranked concepts as a vector representing the article. We classify four types of relationship in which the behavior of addition and deletion maps appropriate and inappropriate concepts. ECR also extend the concept semantics by the editor-concept network. ECR ranks those concepts depending on the mutual signed-reinforcement relationship between the concepts and the editors. The results reveal that our method leads to prominent performance improvement and increases the correlation coefficient by a factor ranging from 4% to 23% over previous methods that calculate the relatedness between two articles. 0 0
Related entity finding using semantic clustering based on wikipedia categories Stratogiannis G.
Georgios Siolas
Andreas Stafylopatis
Lecture Notes in Computer Science English 2013 We present a system that performs Related Entity Finding, that is, Question Answering that exploits Semantic Information from the WWW and returns URIs as answers. Our system uses a search engine to gather all candidate answer entities and then a linear combination of Information Retrieval measures to choose the most relevant. For each one we look up its Wikipedia page and construct a novel vector representation based on the tokenization of the Wikipedia category names. This novel representation gives our system the ability to compute a measure of semantic relatedness between entities, even if the entities do not share any common category. We use this property to perform a semantic clustering of the candidate entities and show that the biggest cluster contains entities that are closely related semantically and can be considered as answers to the query. Performance measured on 20 topics from the 2009 TREC Related Entity Finding task shows competitive results. 0 0
Research on measuring semantic correlation based on the Wikipedia hyperlink network Ye F.
Zhang F.
Luo X.
Xu L.
2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Proceedings English 2013 As a free online encyclopedia with a large-scale of knowledge coverage, rich semantic information and quick update speed, Wikipedia brings new ideas to measure semantic correlation. In this paper, we present a new method for measuring the semantic correlation between words by mining rich semantic information that exists in Wikipedia. Unlike the previous methods that calculate semantic relatedness merely based on the page network or the category network, our method not only takes into account the semantic information of the page network, also combines the semantic information of the category network, and it improve the accuracy of the results. Besides, we analyze and evaluate the algorithm by comparing the calculation results with famous knowledge base (e.g., Hownet) and traditional methods based on Wikipedia on the same test set, and prove its superiority. 0 0
Semantic relatedness estimation using the layout information of wikipedia articles Chan P.
Hijikata Y.
Kuramochi T.
Nishida S.
International Journal of Cognitive Informatics and Natural Intelligence English 2013 Computing the semantic relatedness between two words or phrases is an important problem in fields such as information retrieval and natural language processing. Explicit Semantic Analysis (ESA), a state-of-the-art approach to solve the problem uses word frequency to estimate relevance. Therefore, the relevance of words with low frequency cannot always be well estimated. To improve the relevance estimate of low-frequency words and concepts, the authors apply regression to word frequency, its location in an article, and its text style to calculate the relevance. The relevance value is subsequently used to compute semantic relatedness. Empirical evaluation shows that, for low-frequency words, the authors' method achieves better estimate of semantic relatedness over ESA. Furthermore, when all words of the dataset are considered, the combination of the authors' proposed method and the conventional approach outperforms the conventional approach alone. Copyright 0 0
Using proximity to compute semantic relatedness in RDF graphs Paulo Leal J. Computer Science and Information Systems English 2013 Extracting the semantic relatedness of terms is an important topic in several areas, including data mining, information retrieval and web recommendation. This paper presents an approach for computing the semantic relatedness of terns in RDF graphs based on the notion of proximity. It proposes a formal definition of proximity in terms of the set paths connecting two concept nodes, and an algorithm for finding this set and computing proximity with a given error margin. This algorithm was implemented on a tool called Shakti that extracts relevant ontological data for a given domain from DBpedia - a community effort to extract structured data from the Wikipedia. To validate the proposed approach Shakti was used to recommend web pages on a Portuguese social site related to alternative music and the results of that experiment are also reported. 0 0
Wiki3C: Exploiting wikipedia for context-aware concept categorization Jiang P.
Hou H.
Long Chen
Shun-ling Chen
Conglei Yao
Chenliang Li
Wang M.
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 Wikipedia is an important human generated knowledge base containing over 21 million articles organized by millions of categories. In this paper, we exploit Wikipedia for a new task of text mining: Context-aware Concept Categorization. In the task, we focus on categorizing concepts according to their context. We exploit article link feature and category structure in Wikipedia, followed by introducing Wiki3C, an unsupervised and domain independent concept categorization approach based on context. In the approach, we investigate two strategies to select and filter Wikipedia articles for the category representation. Besides, a probabilistic model is employed to compute the semantic relatedness between two concepts in Wikipedia. Experimental evaluation using manually labeled ground truth shows that our proposed Wiki3C can achieve a noticeable improvement over the baselines without considering contextual information. 0 0
A hybrid method based on WordNet and Wikipedia for computing semantic relatedness between texts Malekzadeh R.
Bagherzadeh J.
Noroozi A.
AISP 2012 - 16th CSI International Symposium on Artificial Intelligence and Signal Processing English 2012 In this article we present a new method for computing semantic relatedness between texts. For this purpose we use a tow-phase approach. The first phase involves modeling document sentences as a matrix to compute semantic relatedness between sentences. In the second phase, we compare text relatedness by using the relation of their sentences. Since Semantic relation between words must be searched in lexical semantic knowledge source, selecting a suitable source is very important, so that produced accurate results with correct selection. In this work, we attempt to capture the semantic relatedness between texts with a more accuracy. For this purpose, we use a collection of tow well known knowledge bases namely, WordNet and Wikipedia, so that provide more complete data source for calculate the semantic relatedness with a more accuracy. We evaluate our approach by comparison with other existing techniques (on Lee datasets). 0 0
Are human-input seeds good enough for entity set expansion? Seeds rewriting by leveraging Wikipedia semantic knowledge Qi Z.
Kang Liu
Jun Zhao
Lecture Notes in Computer Science English 2012 Entity Set Expansion is an important task for open information extraction, which refers to expanding a given partial seed set to a more complete set that belongs to the same semantic class. Many previous researches have proved that the quality of seeds can influence expansion performance a lot since human-input seeds may be ambiguous, sparse etc. In this paper, we propose a novel method which can generate new, high-quality seeds and replace original, poor-quality ones. In our method, we leverage Wikipedia as a semantic knowledge to measure semantic relatedness and ambiguity of each seed. Moreover, to avoid the sparseness of the seed, we use web resources to measure its population. Then new seeds are generated to replace original, poor-quality seeds. Experimental results show that new seed sets generated by our method can improve entity expansion performance by up to average 9.1% over original seed sets. 0 0
Choosing better seeds for entity set expansion by leveraging wikipedia semantic knowledge Qi Z.
Kang Liu
Jun Zhao
Communications in Computer and Information Science English 2012 Entity Set Expansion, which refers to expanding a human-input seed set to a more complete set which belongs to the same semantic category, is an important task for open information extraction. Because human-input seeds may be ambiguous, sparse etc., the quality of seeds has a great influence on expansion performance, which has been proved by many previous researches. To improve seeds quality, this paper proposes a novel method which can choose better seeds from original input ones. In our method, we leverage Wikipedia semantic knowledge to measure semantic relatedness and ambiguity of each seed. Moreover, to avoid the sparseness of the seed, we use web corpus to measure its population. Lastly, we use a linear model to combine these factors to determine the final selection. Experimental results show that new seed sets chosen by our method can improve expansion performance by up to average 13.4% over random selected seed sets. 0 0
Computing text-to-text semantic relatedness based on building and analyzing enriched concept graph Jahanbakhsh Nagadeh Z.
Mahmoudi F.
Jadidinejad A.H.
Lecture Notes in Electrical Engineering English 2012 This paper discusses about effective usage of key concepts in computing texts semantic relatedness. Thus, we present a novel method for computing texts semantic relatedness by using key concepts. Problem of appropriate semantic resource selection is very important in Semantic Relatedness algorithms. For this purpose, we proposed to use a collection of two semantic resource namely, WordNet, Wikipedia, so that provide more complete data source and accuracy for calculate the semantic relatedness. Result of this proposal is compute semantic relatedness between almost any concepts pair. In purposed method, text is modeled as a graph of semantic relatedness between concepts of text that are exploited from WordNet and Wikipedia. This graph is named Enriched Concepts Graph. Then key concepts are extracted by analyzing ECG. Finally, texts semantic relatedness is obtained semantically by comparing key concepts of texts together. We evaluated our approach and obtained a high correlation coefficient of 0.782 which outperformed all other existing state of art approaches. © 2012 Springer Science+Business Media B.V. 0 0
Explanatory semantic relatedness and explicit spatialization for exploratory search Brent Hecht
Carton S.H.
Mahmood Quaderi
Johannes Schoning
Raubal M.
Darren Gergle
Doug Downey
SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval English 2012 Exploratory search, in which a user investigates complex concepts, is cumbersome with today's search engines. We present a new exploratory search approach that generates interactive visualizations of query concepts using thematic cartography (e.g. choropleth maps, heat maps). We show how the approach can be applied broadly across both geographic and non-geographic contexts through explicit spatialization, a novel method that leverages any figure or diagram - from a periodic table, to a parliamentary seating chart, to a world map - as a spatial search environment. We enable this capability by introducing explanatory semantic relatedness measures. These measures extend frequently-used semantic relatedness measures to not only estimate the degree of relatedness between two concepts, but also generate human-readable explanations for their estimates by mining Wikipedia's text, hyperlinks, and category structure. We implement our approach in a system called Atlasify, evaluate its key components, and present several use cases. 0 0
Harnessing Wikipedia semantics for computing contextual relatedness Jabeen S.
Gao X.
Andreae P.
Lecture Notes in Computer Science English 2012 This paper proposes a new method of automatically measuring semantic relatedness by exploiting Wikipedia as an external knowledge source. The main contribution of our research is to propose a relatedness measure based on Wikipedia senses and hyperlink structure for computing contextual relatedness of any two terms. We have evaluated the effectiveness of our approach using three datasets and have shown that our approach competes well with other well known existing methods. 0 0
Improving cross-document knowledge discovery using explicit semantic analysis Yan P.
Jin W.
Lecture Notes in Computer Science English 2012 Cross-document knowledge discovery is dedicated to exploring meaningful (but maybe unapparent) information from a large volume of textual data. The sparsity and high dimensionality of text data present great challenges for representing the semantics of natural language. Our previously introduced Concept Chain Queries (CCQ) was specifically designed to discover semantic relationships between two concepts across documents where relationships found reveal semantic paths linking two concepts across multiple text units. However, answering such queries only employed the Bag of Words (BOW) representation in our previous solution, and therefore terms not appearing in the text literally are not taken into consideration. Explicit Semantic Analysis (ESA) is a novel method proposed to represent the meaning of texts in a higher dimensional space of concepts which are derived from large-scale human built repositories such as Wikipedia. In this paper, we propose to integrate the ESA technique into our query processing, which is capable of using vast knowledge from Wikipedia to complement existing information from text corpus and alleviate the limitations resulted from the BOW representation. The experiments demonstrate the search quality has been greatly improved when incorporating ESA into answering CCQ, compared with using a BOW-based approach. 0 0
Integrating semantic relatedness in a collaborative filtering system Ferrara F.
Tasso C.
Proceedings ABIS 2012 - 19th Intl. Workshop on Personalization and Recommendation on the Web and Beyond, Held at Mensch and Computer 2012 English 2012 Collaborative Filtering (CF) recommender systems use opinions of people for filtering relevant information. The accuracy of these applications depends on the mechanism used to filter and combine the opinions (the feedback) provided by users. In this paper we propose a mechanism aimed at using semantic relations extracted from Wikipedia in order to adaptively filter and combine the feedback of people. The semantic relatedness among the concepts/pages of Wikipedia is used to identify the opinions which are more significant for predicting a rating for an item. We show that our approach improves the accuracy of the predictions and it also opens opportunities for providing explanations on the obtained recommendations. 0 0
Measuring entity semantic relatedness using wikipedia Medina L.
Fred A.L.N.
Rodrigues R.
Filipe J.
KDIR 2012 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval English 2012 In this paper we propose a semantic relatedness measure between scientific concepts, using Wikipedia as an hierarchical taxonomy. The devised measure examines the length of Wikipedia category path between two concepts, assigning a weight to each category that corresponds to its depth in the hierarchy. This procedure was extended to measure the relatedness between two distinct concept sets (herein referred to as entities), where the amount of shared nodes in the paths computed for all possible concept sets is also integrated in a global relatedness measure index. Copyright 0 0
Query directed web page clustering using suffix tree and wikipedia links Jonghun Park
Gao X.
Andreae P.
Lecture Notes in Computer Science English 2012 Recent research on Web page clustering has shown that the user query plays a critical role in guiding the categorisation of web search results. This paper combines our Query Directed Clustering algorithm (QDC) with another existing algorithm, Suffix Tree Clustering (STC), to identify common phrases shared by documents for base cluster identification. One main contribution is the utilising of a new Wikipedia link based measure to estimate the semantic relatedness between query and the base cluster labels, which has shown great promise in identifying the good base clusters. Our experimental results show that the performance is improved by utilising suffix trees and Wikipedia links. 0 0
REWOrD: Semantic relatedness in the web of data Pirro G. Proceedings of the National Conference on Artificial Intelligence English 2012 This paper presents REWOrD, an approach to compute semantic relatedness between entities in the Web of Data representing real word concepts. REWOrD exploits the graph nature of RDF data and the SPARQL query language to access this data. Through simple queries, REWOrD constructs weighted vectors keeping the informativeness of RDF predicates used to make statements about the entities being compared. The most informative path is also considered to further refine informativeness. Relatedness is then computed by the cosine of the weighted vectors. Differently from previous approaches based on Wikipedia, REWOrD does not require any preprocessing or custom data transformation. Indeed, it can leverage whatever RDF knowledge base as a source of background knowledge. We evaluated REWOrD in different settings by using a new dataset of real word entities and investigate its flexibility. As compared to related work on classical datasets, REWOrD obtains comparable results while, on one side, it avoids the burden of preprocessing and data transformation and, on the other side, it provides more flexibility and applicability in a broad range of domains. Copyright © 2012, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
The difficulty of path traversal in information networks Takes F.W.
Kosters W.A.
KDIR 2012 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval English 2012 This paper introduces a set of classification techniques for determining the difficulty - for a human - of path traversal in an information network. In order to ensure the generalizability of our approach, we do not use ontologies or concepts of expected semantic relatedness, but rather focus on local and global structural graph properties and measures to determine the difficulty of finding a certain path. Using a large corpus of over two million traversed paths on Wikipedia, we demonstrate how our techniques are able to accurately assess the human difficulty of finding a path between two articles within an information network. Copyright 0 0
WNavi s: Navigating Wikipedia semantically with an SNA-based summarization technique Wu I.-C.
Lin Y.-S.
Decision Support Systems English 2012 Link-based applications like Wikipedia are becoming increasingly popular because they provide users with an efficient way to find needed knowledge, such as searching for definitions and information about a particular topic, and exploring articles on related topics. This work introduces a semantics-based navigation application called WNavi s, to facilitate information-seeking activities in internal link-based websites in Wikipedia. WNavi s is based on the theories and techniques of link mining, semantic relatedness analysis and text summarization. Our goal is to develop an application that helps users find related articles for a seed query (topic) easily and then quickly check the content of articles to explore a new concept or topic in Wikipedia. Technically, we construct a preliminary topic network by analyzing the internal links of Wikipedia and applying the normalized Google distance algorithm to quantify the strength of the semantic relationships between articles via key terms. Because not all the content of articles in Wikipedia is relevant to users' information needs, it is desirable to locate specific information for users and enable them to quickly explore and read topic-related articles. Accordingly, we propose an SNA-based single and multiple-document summarization technique that can extract meaningful sentences from articles. We applied a number of intrinsic and extrinsic evaluation methods to demonstrate the efficacy of the summarization techniques in terms of precision, and recall. The results suggest that the proposed summarization technique is effective. Our findings have implications for the design of a navigation tool that can help users explore related articles in Wikipedia quickly. © 2012 Elsevier B.V. All rights reserved. 0 0
WSR: A semantic relatedness measure based on Wikipedia structure Sun C.-C.
Shen D.-R.
Shan J.
Nie T.-Z.
Yu G.
Jisuanji Xuebao/Chinese Journal of Computers Chinese 2012 This paper proposes a semantic relatedness measure based on Wikipedia structure: WikiStruRel (WSR). Nowadays, Wikipedia is the largest and the fastest-growing online encyclopedia, consisting of two net-like structures: an article referenced network and a category tree (actually a tree-like graph), which include lots of explicitly defined semantic information. WSR explicitly analyzes the article referenced network and the category tree from Wikipedia and computes semantic relatedness between words. While WSR achieves effective accuracy and large coverage by testing on three common datasets, the measure doesn't have to deal with text, resulting in low cost. 0 0
Web image retrieval re-ranking with Wikipedia semantics Seungwoo Lee
Cho S.
International Journal of Multimedia and Ubiquitous Engineering English 2012 Nowadays, to take advantage of tags is a general tendency when users need to store or retrieve images on the Web. In this article, we introduce some approaches to calculate semantic importance of tags attached to Web images, and to make re-ranking the retrieved images according to them. We have compared the results from image re-ranking with two semantic providers, Word Net and Wikipedia. With the semantic importance of image tags calculated by using Wikipedia, we found the superiority of the method in precision and recall rate as experimental results. 0 0
Web image retrieval using semantic prior tags Seungwoo Lee
Cho S.
Journal of Convergence Information Technology English 2012 This research is for extraction and utilization of semantic information from the Web image tags based on Wikipedia. Generally, most photo images stored on the Web have lots of tags added with user's subjective judgments, not by the importance of them. So, in tagged Web image retrieval, they have become the cause of precision rate decrease. In this paper, we suggest a method deals with selecting prior tags when tagged images are uploaded to online Web image databases, and using them in image retrieval. This method includes calculation of semantic relatedness between tags based on Wikipedia for prior tag selection. Also, it is characterized by multilevel search of tagged images with prior tags. For evaluation, we compared our method with Flickr's method, which is a simple matching of tags to a given query. As the results, we found the superiority of our method in precision and recall rate. 0 0
What is the relationship about? Extracting information about relationships from wikipedia Mathiak B.
Pena V.M.M.
Wira-Alam A.
WEBIST 2012 - Proceedings of the 8th International Conference on Web Information Systems and Technologies English 2012 What is the relationship between terms? Document analysis tells us that "Crime" is close to "Victim" and not so close to "Banana". While for common terms like Sun and Light the nature of the relationship is clear, the measure becomes more fuzzy when dealing with more uncommonly used terms and concepts and partial information. Semantic relatedness is typically calculated from an encyclopedia like Wikipedia, but Wikipedia contains a lot of information that is not common knowledge. So, when a computer calculates that Belarus and Ukraine are closely related, what does it mean to me as a human? In this paper, we take a look at perceived relationship and qualify it in a human-readable way. The result is a search engine, designed to take two terms and explain how they relate to each other. We evaluate this through a user study which gauges how useful this extra information is to humans when making a judgment about relationships. 0 0
A link-based visual search engine for Wikipedia David N. Milne
Ian H. Witten
JCDL English 2011 0 0
A query expansion technique using the EWC semantic relatedness measure Vitaly Klyuev
Haralambous Y.
Informatica (Ljubljana) English 2011 This paper analyses the efficiency of the EWC semantic relatedness measure in an ad-hoc retrieval task. This measure combines the Wikipedia-based Explicit Semantic Analysis (ESA) measure, the WordNet path measure and the mixed collocation index. EWC considers encyclopaedic, ontological, and collocational knowledge about terms. This advantage of EWC is a key factor to find precise terms for automatic query expansion. In the experiments, the open source search engine Terrier is utilised as a tool to index and retrieve data. The proposed technique is tested on the NTCIR data collection. The experiments demonstrated superiority of EWC over ESA. 0 0
An exploratory study of navigating Wikipedia semantically: Model and application Wu I.-C.
Lin Y.-S.
Liu C.-H.
Lecture Notes in Computer Science English 2011 Due to the popularity of link-based applications like Wikipedia, one of the most important issues in online research is how to alleviate information overload on the World Wide Web (WWW) and facilitate effective information-seeking. To address the problem, we propose a semantically-based navigation application that is based on the theories and techniques of link mining, semantic relatedness analysis and text summarization. Our goal is to develop an application that assists users in efficiently finding the related subtopics for a seed query and then quickly checking the content of articles. We establish a topic network by analyzing the internal links of Wikipedia and applying the Normalized Google Distance algorithm in order to quantify the strength of the semantic relationships between articles via key terms. To help users explore and read topic-related articles, we propose a SNA-based summarization approach to summarize articles. To visualize the topic network more efficiently, we develop a semantically-based WikiMap to help users navigate Wikipedia effectively. 0 0
Combining heterogeneous knowledge resources for improved distributional semantic models Szarvas G.
Torsten Zesch
Iryna Gurevych
Lecture Notes in Computer Science English 2011 The Explicit Semantic Analysis (ESA) model based on term cooccurrences in Wikipedia has been regarded as state-of-the-art semantic relatedness measure in the recent years. We provide an analysis of the important parameters of ESA using datasets in five different languages. Additionally, we propose the use of ESA with multiple lexical semantic resources thus exploiting multiple evidence of term cooccurrence to improve over the Wikipedia-based measure. Exploiting the improved robustness and coverage of the proposed combination, we report improved performance over single resources in word semantic relatedness, solving word choice problems, classification of semantic relations between nominals, and text similarity. 0 0
Concept-based document classification using Wikipedia and value function Pekka Malo
Ankur Sinha
Jyrki Wallenius
Pekka Korhonen
Journal of the American Society for Information Science and Technology English 2011 In this article, we propose a new concept-based method for document classification. The conceptual knowledge associated with the words is drawn from Wikipedia. The purpose is to utilize the abundant semantic relatedness information available in Wikipedia in an efficient value function-based query learning algorithm. The procedure learns the value function by solving a simple linear programming problem formulated using the training documents. The learning involves a step-wise iterative process that helps in generating a value function with an appropriate set of concepts (dimensions) chosen from a collection of concepts. Once the value function is formulated, it is utilized to make a decision between relevance and irrelevance. The value assigned to a particular document from the value function can be further used to rank the documents according to their relevance. Reuters newswire documents have been used to evaluate the efficacy of the procedure. An extensive comparison with other frameworks has been performed. The results are promising. 0 0
Constraint optimization approach to context based word selection Matsuno J.
Toru Ishida
IJCAI International Joint Conference on Artificial Intelligence English 2011 Consistent word selection in machine translation is currently realized by resolving word sense ambiguity through the context of a single sentence or neighboring sentences. However, consistent word selection over the whole article has yet to be achieved. Consistency over the whole article is extremely important when applying machine translation to collectively developed documents like Wikipedia. In this paper, we propose to consider constraints between words in the whole article based on their semantic relatedness and contextual distance. The proposed method is successfully implemented in both statistical and rule-based translators. We evaluate those systems by translating 100 articles in the English Wikipedia into Japanese. The results show that the ratio of appropriate word selection for common nouns increased to around 75% with our method, while it was around 55% without our method. 0 0
Cross-lingual recommendations in a resource-based learning scenario Schmidt S.
Scholl P.
Rensing C.
Steinmetz R.
Lecture Notes in Computer Science English 2011 CROKODIL is a platform supporting resource-based learning scenarios for self-directed, on-task learning with web resources. As CROKODIL enables the forming of possibly large learning communities, the stored data is growing in a large scale. Thus, an appropriate recommendation of tags and learning resources becomes increasingly important for supporting learners. We propose semantic relatedness between tags and resources as a basis of recommendation and identify Explicit Semantic Analysis (ESA) using Wikipedia as reference corpus as a viable option. However, data from CROKODIL shows that tags and resources are often composed in different languages. Thus, a monolingual approach to provide recommendations is not applicable in CROKODIL. Thus, we examine strategies for providing mappings between different languages, extending ESA to provide cross-lingual capabilities. Specifically, we present mapping strategies that utilize additional semantic information contained in Wikipedia. Based on CROKODIL's application scenario, we present an evaluation design and show results of cross-lingual ESA. 0 0
Defining ontology by using users collaboration on social media Kamran S.
Crestani F.
English 2011 This novel method is proposed for building a reliable ontology around specific concepts, by using the immense potential of active volunteering collaboration of detected knowledgeable users on social media. Copyright 2011 ACM. 0 0
Document Topic Extraction Based on Wikipedia Category Jiali Yun
Liping Jing
Jian Yu
Houkuan Huang
Ying Zhang
CSO English 2011 0 0
Harnessing different knowledge sources to measure semantic relatedness under a uniform model Zhang Z.
Gentile A.L.
Ciravegna F.
EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference English 2011 Measuring semantic relatedness between words or concepts is a crucial process to many Natural Language Processing tasks. Exiting methods exploit semantic evidence from a single knowledge source, and are predominantly evaluated only in the general domain. This paper introduces a method of harnessing different knowledge sources under a uniform model for measuring semantic relatedness between words or concepts. Using Wikipedia and WordNet as examples, and evaluated in both the general and biomedical domains, it successfully combines strengths from both knowledge sources and outperforms state-of-the-art on many datasets. 0 0
Measuring Semantic Relatedness Using Wikipedia Revision Information in a Signed Network Wen-Teng Yang
Hung-Yu Kao
TAAI English 2011 0 0
Multipedia: Enriching DBpedia with multimedia information Garcia-Silva A.
Max Jakob
Mendes P.N.
Christian Bizer
KCAP 2011 - Proceedings of the 2011 Knowledge Capture Conference English 2011 Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%. 0 0
Relational similarity measure: An approach combining Wikipedia and wordnet Cao Y.J.
Lu Z.
Cai S.M.
Applied Mechanics and Materials English 2011 Relational similarities between two pairs of words are the degrees of their semantic relations. Vector Space Model (VSM) is used to measure the relational similarity between two pairs of words, however it needs create patterns manually and these patterns are limited. Recently, Latent Relational Analysis (LRA) is proposed and achieves state-of-art results. However, it is time-consuming and cannot express implicit semantic relations. In this study, we propose a new approach to measure relational similarities between two pairs of words by combining Wordnet3.0 and the Web-Wikipedia, thus implicit semantic relations from the very large corpus can be mined. The proposed approach mainly possesses two characters: (1) A new method is proposed in the pattern extraction step, which considers various part-of-speeches of words. (2) Wordnet3.0 is applied to calculate the semantic relatedness between a pair of words so that the implicit semantic relation of the two words can be expressed. Experimental evaluation based on the 374 SAT multiple-choice word-analogy questions, the precision of the proposed approach is 43.9%, which is lower than that of LRA suggested by Turney in 2005, but the suggested approach mainly focuses on mining the semantic relations among words. 0 0
Semantic relatedness for named entity disambiguation using a small wikipedia Izaskun Fernandez
Iñaki Alegria
Nerea Ezeiza
TSD English 2011 0 0
Semantic relatedness measurement based on Wikipedia link co-occurrence analysis Masahiro Ito
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
International Journal of Web Information Systems English 2011 Purpose: Recently, the importance and effectiveness of Wikipedia Mining has been shown in several researches. One popular research area on Wikipedia Mining focuses on semantic relatedness measurement, and research in this area has shown that Wikipedia can be used for semantic relatedness measurement. However, previous methods are facing two problems; accuracy and scalability. To solve these problems, the purpose of this paper is to propose an efficient semantic relatedness measurement method that leverages global statistical information of Wikipedia. Furthermore, a new test collection is constructed based on Wikipedia concepts for evaluating semantic relatedness measurement methods. Design/methodology/approach: The authors' approach leverages global statistical information of the whole Wikipedia to compute semantic relatedness among concepts (disambiguated terms) by analyzing co-occurrences of link pairs in all Wikipedia articles. In Wikipedia, an article represents a concept and a link to another article represents a semantic relation between these two concepts. Thus, the co-occurrence of a link pair indicates the relatedness of a concept pair. Furthermore, the authors propose an integration method with tfidf as an improved method to additionally leverage local information in an article. Besides, for constructing a new test collection, the authors select a large number of concepts from Wikipedia. The relatedness of these concepts is judged by human test subjects. Findings: An experiment was conducted for evaluating calculation cost and accuracy of each method. The experimental results show that the calculation cost ofthis approachisvery low compared toone of the previous methods and more accurate than all previous methods for computing semantic relatedness. Originality/value: This is the first proposal of co-occurrence analysis of Wikipedia links for semantic relatedness measurement. The authors show that this approach is effective to measure semantic relatedness among concepts regarding calculation cost and accuracy. The findings may be useful to researchers who are interested in knowledge extraction, as well as ontology researches. 0 0
Automatically acquiring a semantic network of related concepts Szumlanski S.
Gomez F.
International Conference on Information and Knowledge Management, Proceedings English 2010 We describe the automatic construction of a semantic network1, in which over 3000 of the most frequently occurring monosemous nouns2 in Wikipedia (each appearing between 1,500 and 100,000 times) are linked to their semantically related concepts in the WordNet noun ontology. Relatedness between nouns is discovered automatically from cooccurrence in Wikipedia texts using an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among related nouns to automatically dis-ambiguate them to their appropriate senses (i.e., concepts). Through the act of disambiguation, we begin to accumulate relatedness data for concepts denoted by polysemous nouns, as well. The resultant concept-to-concept associations, covering 17,543 nouns, and 27,312 distinct senses among them, constitute a large-scale semantic network of related concepts that can be conceived of as augmenting the WordNet noun ontology with related-to links. 0 0
Computing semantic relatedness between named entities using Wikipedia Hongyan Liu
Yirong Chen
Proceedings - International Conference on Artificial Intelligence and Computational Intelligence, AICI 2010 English 2010 In this paper the authors suggest a novel approach that uses Wikipedia to measure the semantic relatedness between Chinese named entities, such as names of persons, books, softwares, etc. The relatedness is measured through articles in Wikipedia that are related to the named entities. The authors select a set of "definition words" which are hyperlinks from these articles, and then compute the relatedness between two named entities as the relatedness between two sets of definition words. The authors propose two ways to measure the relatedness between two definition words: by Wiki-articles related to the words or by categories of the words. Proposed approaches are compared with several other baseline models through experiments. The experimental results show that this method renders satisfactory results. 0 0
Educational Tool Based on Topology and Evolution of Hyperlinks in the Wikipedia Lauri Lahti ICALT English 2010 0 0
Educational tool based on topology and evolution of hyperlinks in the Wikipedia Lauri Lahti Proceedings - 10th IEEE International Conference on Advanced Learning Technologies, ICALT 2010 English 2010 We propose a new method to support educational exploration in the hyperlink network of the Wikipedia online encyclopedia. The learner is provided with alternative parallel ranking lists, each one promoting hyperlinks that represent a different pedagogical perspective to the desired learning topic. The learner can browse the conceptual relations between the latest versions of articles or the conceptual relations belonging to consecutive temporal versions of an article, or a mixture of both approaches. Based on her needs and intuition, the learner explores hyperlink network and meanwhile the method builds automatically concept maps that reflect her conceptualization process and can be used for varied educational purposes. Initial experiments with a prototype tool based on the method indicate enhancement to ordinary learning results and suggest further research. 0 0
Efficient Wikipedia-based semantic interpreter by exploiting top-k processing Kim J.W.
Ashwin Kashyap
Deyi Li
Sandilya Bhamidipati
International Conference on Information and Knowledge Management, Proceedings English 2010 Proper representation of the meaning of texts is crucial to enhancing many data mining and information retrieval tasks, including clustering, computing semantic relatedness between texts, and searching. Representing of texts in the concept-space derived from Wikipedia has received growing attention recently, due to its comprehensiveness and expertise. This concept-based representation is capable of extracting semantic relatedness between texts that cannot be deduced with the bag of words model. A key obstacle, however, for using Wikipedia as a semantic interpreter is that the sheer size of the concepts derived from Wikipedia makes it hard to efficiently map texts into concept-space. In this paper, we develop an efficient algorithm which is able to represent the meaning of a text by using the concepts that best match it. In particular, our approach first computes the approximate top-k concepts that are most relevant to the given text. We then leverage these concepts for representing the meaning of the given text. The experimental results show that the proposed technique provides significant gains in execution time over current solutions to the problem. 0 0
Exploring the semantics behind a collection to improve automated image annotation Llorente A.
Motta E.
Stefan Ruger
Lecture Notes in Computer Science English 2010 The goal of this research is to explore several semantic relatedness measures that help to refine annotations generated by a baseline non-parametric density estimation algorithm. Thus, we analyse the benefits of performing a statistical correlation using the training set or using the World Wide Web versus approaches based on a thesaurus like WordNet or Wikipedia (considered as a hyperlink structure). Experiments are carried out using the dataset provided by the 2009 edition of the ImageCLEF competition, a subset of the MIR-Flickr 25k collection. Best results correspond to approaches based on statistical correlation as they do not depend on a prior disambiguation phase like WordNet and Wikipedia. Further work needs to be done to assess whether proper disambiguation schemas might improve their performance. 0 0
Extended explicit semantic analysis for calculating semantic relatedness of web resources Scholl P.
Bohnstedt D.
Dominguez Garcia R.
Rensing C.
Steinmetz R.
Lecture Notes in Computer Science English 2010 Finding semantically similar documents is a common task in Recommender Systems. Explicit Semantic Analysis (ESA) is an approach to calculate semantic relatedness between terms or documents based on similarities to documents of a reference corpus. Here, usually Wikipedia is applied as reference corpus. We propose enhancements to ESA (called Extended Explicit Semantic Analysis) that make use of further semantic properties of Wikipedia like article link structure and categorization, thus utilizing the additional semantic information that is included in Wikipedia. We show how we apply this approach to recommendation of web resource fragments in a resource-based learning scenario for self-directed, on-task learning with web resources. 0 0
Semantic relatedness approach for named entity disambiguation Gentile A.L.
Zhang Z.
Linsi Xia
Iria J.
Communications in Computer and Information Science English 2010 Natural Language is a mean to express and discuss about concepts, objects, events, i.e., it carries semantic contents. One of the ultimate aims of Natural Language Processing techniques is to identify the meaning of the text, providing effective ways to make a proper linkage between textual references and their referents, that is, real world objects. This work addresses the problem of giving a sense to proper names in a text, that is, automatically associating words representing Named Entities with their referents. The proposed methodology for Named Entity Disambiguation is based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia. We show that, without building a Bag of Words representation of the text, but only considering named entities within the text, the proposed paradigm achieves results competitive with the state of the art on two different datasets. 0 0
The tower of Babel meets web 2.0: User-generated content and its applications in a multilingual context Brent Hecht
Darren Gergle
Conference on Human Factors in Computing Systems - Proceedings English 2010 This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create "culturally- aware applications" and "hyperlingual applications". 0 2
Using passage-based language model for opinion detection in blogs Saad Missen M.M.
Boughanem M.
Cabanac G.
Proceedings of the ACM Symposium on Applied Computing English 2010 In this work, we evaluate the importance of Passages in blogs especially when we are dealing with the task of Opinion Detection. We argue that passages are basic building blocks of blogs. Therefore, we use Passage-Based Language Modeling approach as our approach for Opinion Finding in Blogs. Our decision to use Language Modeling (LM) in this work is totally based on the performance LM has given in various Opinion Detection Approaches. In addition to this, we propose a novel method for bi-dimensional Query Expansion with relevant and opinionated terms using Wikipedia and Relevance-Feedback mechanism respectively. We also compare the impacts of two different query terms weighting (and ranking) approaches on final results. Besides all this, we also compare the performance of three Passage-based document ranking functions (Linear, Avg, Max). For evaluation purposes, we use the data collection of TREC Blog06 with 50 topics of TREC 2006 over TREC provided best baseline with opinion finding MAP of 0.3022. Our approach gives a MAP improvement of almost 9.29% over best TREC provided baseline (baseline4). 0 0
Wisdom of crowds versus wisdom of linguists - Measuring the semantic relatedness of words Torsten Zesch
Iryna Gurevych
Natural Language Engineering English 2010 In this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the wisdom of linguists (i.e., classical wordnets) or by the wisdom of crowds (i.e., collaboratively constructed knowledge sources like Wikipedia). The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that wisdom of crowds based resources are not superior to wisdom of linguists based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available1 for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications. Copyright 0 0
A study on the semantic relatedness of query and document terms in information retrieval Muller C.
Iryna Gurevych
EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 English 2009 The use of lexical semantic knowledge in information retrieval has been a field of active study for a long time. Collaborative knowledge bases like Wikipedia and Wiktionary, which have been applied in computational methods only recently, offer new possibilities to enhance information retrieval. In order to find the most beneficial way to employ these resources, we analyze the lexical semantic relations that hold among query and document terms and compare how these relations are represented by a measure for semantic relatedness. We explore the potential of different indicators of document relevance that are based on semantic relatedness and compare the characteristics and performance of the knowledge bases Wikipedia, Wiktionary and WordNet. 0 0
Cross-lingual semantic relatedness using encyclopedic knowledge Hassan S.
Rada Mihalcea
EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 English 2009 In this paper, we address the task of crosslingual semantic relatedness. We introduce a method that relies on the information extracted from Wikipedia, by exploiting the interlanguage links available between Wikipedia versions in multiple languages. Through experiments performed on several language pairs, we show that the method performs well, with a performance comparable to monolingual measures of relatedness. 0 0
Domain specific ontology on computer science Salahli M.A.
Gasimzade T.M.
Guliyev A.I.
ICSCCW 2009 - 5th International Conference on Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control English 2009 In this paper we introduce the application system based on the domain specific ontology. Some design problems of the ontology are discussed. The ontology is based on the WordNet's database and consists of Turkish and English terms on computer science and informatics. Second we present the method for determining a set of words, which are related to a given concept and computing the degree of semantic relatedness between them. The presented method has been used for semantic searching process, which is carried out by our application. 0 0
Effective extraction of thematically grouped key terms from text Maria Grineva
Maxim Grinev
Dmitry Lizorkin
AAAI Spring Symposium - Technical Report English 2009 We present a novel method for extraction of key terms from text documents. The important and novel feature of our method is that it produces groups of key terms, while each group contains key terms semantically related to one of the main themes of the document. Our method bases on a com-bination of the following two techniques: Wikipedia-based semantic relatedness measure of terms and algorithm for detecting community structure of a network. One of the advantages of our method is that it does not require any training, as it works upon the Wikipedia knowledge base. Our experimental evaluation using human judgments shows that our method produces key terms with high precision and recall. 0 0
Extracting Key Terms From Noisy and Multitheme Documents Maria Grineva
Maxim Grinev
Dmitry Lizorkin
WWW2009: 18th International World Wide Web Conference 2009 We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable feature of the graph: the terms related to the main topics of the document tend to bunch up into densely interconnected subgraphs or communities, while non-important terms fall into weakly interconnected communities, or even become isolated vertices. We apply graph community detection techniques to partition the graph into thematically cohesive groups of terms. We introduce a criterion function to select groups that contain key terms discarding groups with unimportant terms. To weight terms and determine semantic relatedness between them we exploit information extracted from Wikipedia. Using such an approach gives us the following two advantages. First, it allows effectively processing multi-theme documents. Second, it is good at filtering out noise information in the document, such as, for example, navigational bars or headers in web pages. Evaluations of the method show that it outperforms existing methods producing key terms with higher precision and recall. Additional experiments on web pages prove that our method appears to be substantially more effective on noisy and multi-theme documents than existing methods. 0 0
Link detection with wikipedia He J. Lecture Notes in Computer Science English 2009 This paper describes our participation in the INEX 2008 Link the Wiki track. We focused on the file-to-file task and submitted three runs, which were designed to compare the impact of different features on link generation. For outgoing links, we introduce the anchor likelihood ratio as an indicator for anchor detection, and explore two types of evidence for target identification, namely, the title field evidence and the topic article content evidence. We find that the anchor likelihood ratio is a useful indicator for anchor detection, and that in addition to the title field evidence, re-ranking with the topic article content evidence is effective for improving target identification. For incoming links, we use exact match and retrieval method with language modeling approach, and find that the exact match approach works best. On top of that, our experiment shows that the semantic relatedness between Wikipedia articles also has certain ability to indicate links. 0 0
Related terms search based on WordNet / Wiktionary and its application in Ontology Matching Feiyu Lin Andrew Krizhanovsky RCDL 2009 A set of ontology matching algorithms (for finding correspondences between concepts) is based on a thesaurus that provides the source data for the semantic distance calculations. In this wiki era, new resources may spring up and improve this kind of semantic search. In the paper a solution of this task based on Russian Wiktionary is compared to WordNet based algorithms. Metrics are estimated using the test collection, containing 353 English word pairs with a relatedness score assigned by human evaluators. The experiment shows that the proposed method is capable in principle of calculating a semantic distance between pair of words in any language presented in Russian Wiktionary. The calculation of Wiktionary based metric had required the development of the open-source Wiktionary parser software. 0 0
Using Wikipedia and Wiktionary in domain-specific information retrieval Muller C.
Iryna Gurevych
Lecture Notes in Computer Science English 2009 The main objective of our experiments in the domain-specific track at CLEF 2008 is utilizing semantic knowledge from collaborative knowledge bases such as Wikipedia and Wiktionary to improve the effectiveness of information retrieval. While Wikipedia has already been used in IR, the application of Wiktionary in this task is new. We evaluate two retrieval models, i.e. SR-Text and SR-Word, based on semantic relatedness by comparing their performance to a statistical model as implemented by Lucene. We refer to Wikipedia article titles and Wiktionary word entries as concepts and map query and document terms to concept vectors which are then used to compute the document relevance. In the bilingual task, we translate the English topics into the document language, i.e. German, by using machine translation. For SR-Text, we alternatively perform the translation process by using cross-language links in Wikipedia, whereby the terms are directly mapped to concept vectors in the target language. The evaluation shows that the latter approach especially improves the retrieval performance in cases where the machine translation system incorrectly translates query terms. 0 0
Using Wikipedia as a reference for extracting semantic information from a text. Marco Ronchetti Andrea Prato The Third International Conference on Advances in Semantic Processing http://www.iaria.org/conferences2009/SEMAPRO09.html SEMAPRO 2009, Malta 2009 In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an analysis based on multi-words. Our algorithm outperforms current methods in that the output contains many less false positives. We were also able to understand which (structural) part of the texts provides most of the semantic information extracted by the algorithm. 0 0
Using Wikipedia-Based Conceptual Contexts to Calculate Document Similarity Fabian Kaiser
Holger Schwarz
ICDS English 2009 0 0
Using wordnet's semantic relations for opinion detection in blogs Missen M.M.S.
Boughanem M.
Lecture Notes in Computer Science English 2009 The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach focuses on above problem by processing documents on sentence level using different semantic similarity relations of WordNet between sentence words and list of weighted query words expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP of 0.28 and P@10 of 0.64 with improvement of 27% over baseline results. TREC Blog 2006 data is used as test data collection. 0 0
Wikipedia based semantic related Chinese words exploring and relatedness computing Yanyan Li
Huang K.-Y.
Ren F.-J.
Zhong Y.-X.
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications Chinese 2009 To find how to collect semantic related words and calculate semantic relatedness, an experiment is done to download about 50 thousand documents from the web site of Chinese Wikipedia and extract hyperlinks between lines which contains semantic information. By mining hyperlinked references in documents, about 400 thousand semantic related word pairs are collected. With more experiments on topic groups of related words, tightly related words are grouped into smaller sets with an average semantic relatedness calculated. Semantic relatedness is calculated using information of hyperlink positions and frequencies in documents. Comparing with the result by classic algorithms, the reliability of the new measures is analyzed. 0 0
A Self-Adaptive Explicit Semantic Analysis Method for Computing Semantic Relatedness Using Wikipedia Weiping Wang
Peng Chen
Bowen Liu
FITME English 2008 0 0
An effective, low-cost measure of semantic relatedness obtained from wikipedia links Milne D.
Witten I.H.
AAAI Workshop - Technical Report English 2008 This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Our approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter. Copyright 0 1
Exploiting the collective intelligence contained in Wikipedia to automatically describe the content of a document Marco Ronchetti Anuradha Jambunathan 2008 The Wikipedia phenomenon is very interesting from the point of view of the collective, social effort to produce a large, strongly interlinked body of knowledge. It also offers, for the first time in history, a general source of information coded in electronic form and freely available to anyone. As such, it can be used as a reference for tools aiming at mining semantic meaning from generic documents. In this paper, we propose a clustering-based method that exploits some of the implicit knowledge built into Wikipedia to refine and ameliorate existing approaches. 0 0
GeoSR: Geographically explore semantic relations in world knowledge Brent Hecht
Raubal M.
Lecture Notes in Geoinformation and Cartography English 2008 Methods to determine the semantic relatedness (SR) value between two lexically expressed entities abound in the field of natural language processing (NLP). The goal of such efforts is to identify a single measure that summarizes the number and strength of the relationships between the two entities. In this paper, we present GeoSR, the first adaptation of SR methods to the context of geographic data exploration. By combining the first use of a knowledge repository structure that is replete with non-classical relations, a new means of explaining those relations to users, and the novel application of SR measures to a geographic reference system, GeoSR allows users to geographically navigate and investigate the world knowledge encoded in Wikipedia. There are numerous visualization and interaction paradigms possible with GeoSR; we present one implementation as a proof-of-concept and discuss others. Although, Wikipedia is used as the knowledge repository for our implementation, GeoSR will also work with any knowledge repository having a similar set of properties. 0 0
Improving interaction with virtual globes through spatial thinking: Helping users ask "Why?" Schoming J.
Raubal M.
Marsh M.
Brent Hecht
Antonio Kruger
Michael Rohs
International Conference on Intelligent User Interfaces, Proceedings IUI English 2008 Virtual globes have progressed from little-known technology to broadly popular software in a mere few years. We investigated this phenomenon through a survey and discovered that, while virtual globes are en vogue, their use is restricted to a small set of tasks so simple that they do not involve any spatial thinking. Spatial thinking requires that users ask "what is where" and "why"; the most common virtual globe tasks only include the "what". Based on the results of this survey, we have developed a multi-touch virtual globe derived from an adapted virtual globe paradigm designed to widen the potential uses of the technology by helping its users to inquire about both the "what is where" and "why" of spatial distribution. We do not seek to provide users with full GIS (geographic information system) functionality, but rather we aim to facilitate the asking and answering of simple "why" questions about general topics that appeal to a wide virtual globe user base. Copyright 2008 ACM. 0 0
Mapping the Zeitgeist Johannes Schoning Brent Hecht Fifth International Conference on Geographic Information Science (GIScience) 2008 0 0
Searching and computing for vocabularies with semantic correlations from Chinese Wikipedia Yanyan Li
Huang K.
Ren F.
Zhong Y.
IET Conference Publications English 2008 This paper introduces experiment on searching for semantically correlated vocabularies in Chinese Wikipedia pages and computing semantic correlations. Based on the 54,745 structured documents generated from Wikipedia pages, we explore about 400,000 pairs of Wikipedia vocabularies considering of hyperlinks, overlapped text and document positions. Semantic relatedness is calculated based on the relatedness of Wikipedia documents. From comparing experiment we analyze the reliability of our measures and some other properties. 0 0
Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation Denis Turdakov
Pavel Velikhov
CEUR Workshop Proceedings English 2008 Wikipedia has grown into a high quality up-todate knowledge base and can enable many knowledge-based applications, which rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently produce a list of ranked similar concepts for a given concept is very important for a wide range of applications. We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice's measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation. Our approach to word sense disambiguation is based solely on the similarity measure and produces results with high accuracy. 0 1
Wikipedia link structure and text mining for semantic relation extraction towards a huge scale global web ontology Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
CEUR Workshop Proceedings English 2008 Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. Since it is becoming a database storing all human knowledge, Wikipedia mining is a promising approach that bridges the Semantic Web and the Social Web (a. k. a. Web 2.0). In fact, in the previous researches on Wikipedia mining, it is strongly proved that Wikipedia has a remarkable capability as a corpus for knowledge extraction, especially for relatedness measurement among concepts. However, semantic relatedness is just a numerical strength of a relation but does not have an explicit relation type. To extract inferable semantic relations with explicit relation types, we need to analyze not only the link structure but also texts in Wikipedia. In this paper, we propose a consistent approach of semantic relation extraction from Wikipedia. The method consists of three sub-processes highly optimized for Wikipedia mining; 1) fast preprocessing, 2) POS (Part Of Speech) tag tree analysis, and 3) mainstay extraction. Furthermore, our detailed evaluation proved that link structure mining improves both the accuracy and the scalability of semantic relations extraction. 0 0
Computing Semantic Relatedness using Wikipedia Link Structure David N. Milne Proc. of NZCSRSC, 2007 2007 This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide a vast amount of structured world knowledge about the terms of interest. Our system, the Wikipedia Link Vector Model or WLVM, is unique in that it does so using only the hyperlink structure of Wikipedia rather than its full textual content. To evaluate the algorithm we use a large, widely used test set of manually defined measures of semantic relatedness as our bench-mark. This allows direct comparison of our system with other similar techniques. 0 2
Computing semantic relatedness using Wikipedia Link structure Milne D. Proceedings of NZCSRSC 2007, the 5th New Zealand Computer Science Research Student Conference English 2007 This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide a vast amount of structured world knowledge about the terms of interest. Our system, the Wikipedia Link Vector Model or WLVM, is unique in that it does so using only the hyperlink structure of Wikipedia rather than its full textual content. To evaluate the algorithm we use a large, widely used test set of manually defined measures of semantic relatedness as our bench-mark. This allows direct comparison of our system with other similar techniques. 0 2
What to be? - Electronic Career Guidance based on semantic relatedness Iryna Gurevych
Muller C.
Torsten Zesch
ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics English 2007 We present a study aimed at investigating the use of semantic information in a novel NLP application, Electronic Career Guidance (ECG), in German. ECG is formulated as an information retrieval (IR) task, whereby textual descriptions of professions (documents) are ranked for their relevance to natural language descriptions of a person's professional interests (the topic). We compare the performance of two semantic IR models: (IR-1) utilizing semantic relatedness (SR) measures based on either wordnet or Wikipedia and a set of heuristics, and (IR-2) measuring the similarity between the topic and documents based on Explicit Semantic Analysis (ESA) (Gabrilovich and Markovitch, 2007). We evaluate the performance of SR measures intrinsically on the tasks of (T-1) computing SR, and (T-2) solving Reader's Digest Word Power (RDWP) questions. 0 0
Synonym search in Wikipedia: Synarcher. Rew Krizhanovsky 11-th International Conference "Speech and Computer" SPECOM'2006. Russia, St. Petersburg, June 25-29, pp. 474-477 2006 The program Synarcher for synonym (and related terms) search in the text corpus of special structure (Wikipedia) was developed. The results of the search are presented in the form of graph. It is possible to explore the graph and search for graph elements interactively. Adapted HITS algorithm for synonym search, program architecture, and program work evaluation with test examples are presented in the paper. The proposed algorithm can be applied to a query expansion by synonyms (in a search engine) and a synonym dictionary forming. 0 0