Browse wiki

Jump to: navigation, search
Wisdom of crowds versus wisdom of linguists - Measuring the semantic relatedness of words
Abstract In this article, we present a comprehensivIn this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the wisdom of linguists (i.e., classical wordnets) or by the wisdom of crowds (i.e., collaboratively constructed knowledge sources like Wikipedia). The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that wisdom of crowds based resources are not superior to wisdom of linguists based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available1 for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications. Copyrighte processing (NLP) applications. Copyright
Abstractsub In this article, we present a comprehensivIn this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the wisdom of linguists (i.e., classical wordnets) or by the wisdom of crowds (i.e., collaboratively constructed knowledge sources like Wikipedia). The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that wisdom of crowds based resources are not superior to wisdom of linguists based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available1 for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications. Copyrighte processing (NLP) applications. Copyright
Bibtextype article  +
Doi 10.1017/S1351324909990167  +
Has author Torsten Zesch + , Iryna Gurevych +
Has extra keyword Comprehensive studies + , Concept space + , Dataset + , Experimental conditions + , Knowledge sources + , Natural Language Processing + , Semantic relatedness + , Wikipedia + , Word choices + , Wordnet + , Application programming interfaces (API) + , Computational linguistics + , Computer software + , Industrial research + , Java programming language + , Ontology + , Semantic web + , Semantics + , Vector spaces + , Natural language processing systems +
Issn 13513249  +
Issue 1  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 25–59  +
Published in Natural Language Engineering +
Title Wisdom of crowds versus wisdom of linguists - Measuring the semantic relatedness of words +
Type journal article  +
Volume 16  +
Year 2010 +
Creation dateThis property is a special property in this wiki. 8 November 2014 07:41:45  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Journal articles  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 8 November 2014 07:41:45  +
DateThis property is a special property in this wiki. 2010  +
hide properties that link here 
Wisdom of crowds versus wisdom of linguists - Measuring the semantic relatedness of words + Title
 

 

Enter the name of the page to start browsing from.