Browse wiki

Jump to: navigation, search
Extracting knowledge from Wikipedia articles through distributed semantic analysis
Abstract Computing semantic word similarity and relComputing semantic word similarity and relatedness requires access to vast amounts of semantic space for effective analysis. As a consequence, it is time-consuming to extract useful information from a large amount of data on a single workstation. In this paper, we propose a system, called Distributed Semantic Analysis (DSA), that integrates a distributed-based approach with semantic analysis. DSA builds a list of concept vectors associated with each word by exploiting the knowledge provided by Wikipedia articles. Based on such lists, DSA calculates the degree of semantic relatedness between two words through the cosine measure. The proposed solution is built on top of the Hadoop MapReduce framework and the Mahout machine learning library. Experimental results show two major improvements over the state of the art, with particular reference to the Explicit Semantic Analysis method. First, our distributed approach significantly reduces the computation time to build the concept vectors, thus enabling the use of larger inputs that is the basis for more accurate results. Second, DSA obtains a very high correlation of computed relatedness with reference benchmarks derived by human judgements. Moreover, its accuracy is higher than solutions reported in the literature over multiple benchmarks.n the literature over multiple benchmarks.
Abstractsub Computing semantic word similarity and relComputing semantic word similarity and relatedness requires access to vast amounts of semantic space for effective analysis. As a consequence, it is time-consuming to extract useful information from a large amount of data on a single workstation. In this paper, we propose a system, called Distributed Semantic Analysis (DSA), that integrates a distributed-based approach with semantic analysis. DSA builds a list of concept vectors associated with each word by exploiting the knowledge provided by Wikipedia articles. Based on such lists, DSA calculates the degree of semantic relatedness between two words through the cosine measure. The proposed solution is built on top of the Hadoop MapReduce framework and the Mahout machine learning library. Experimental results show two major improvements over the state of the art, with particular reference to the Explicit Semantic Analysis method. First, our distributed approach significantly reduces the computation time to build the concept vectors, thus enabling the use of larger inputs that is the basis for more accurate results. Second, DSA obtains a very high correlation of computed relatedness with reference benchmarks derived by human judgements. Moreover, its accuracy is higher than solutions reported in the literature over multiple benchmarks.n the literature over multiple benchmarks.
Bibtextype inproceedings  +
Doi 10.1145/2494188.2494195  +
Has author Hieu N.T. + , Di Francesco M. + , Yla-Jaaski A. +
Has extra keyword Distributed approaches + , Effective analysis + , Explicit semantic analysis + , Reference benchmarks + , Semantic analysis + , Semantic relatedness + , Wikipedia knowledge + , Word relatedness + , Distributed computer systems + , Knowledge management + , Semantics +
Has keyword Distributed computing + , Semantic analysis + , Wikipedia knowledge + , Word relatedness +
Isbn 9781450323000  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Published in ACM International Conference Proceeding Series +
Title Extracting knowledge from Wikipedia articles through distributed semantic analysis +
Type conference paper  +
Year 2013 +
Creation dateThis property is a special property in this wiki. 6 November 2014 18:40:06  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 6 November 2014 18:40:06  +
DateThis property is a special property in this wiki. 2013  +
hide properties that link here 
Extracting knowledge from Wikipedia articles through distributed semantic analysis + Title
 

 

Enter the name of the page to start browsing from.