Browse wiki

Jump to: navigation, search
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms
Abstract Topical annotation of documents with keyphTopical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents' content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.rival supervised and unsupervised methods.
Abstractsub Topical annotation of documents with keyphTopical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents' content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.rival supervised and unsupervised methods.
Bibtextype article  +
Doi 10.1177/0165551512472138  +
Has author Joorabchi A. + , Mahdi A.E. +
Has extra keyword Keyphrase annotations + , Keyphrase indexing + , Metadata generation + , Scientific digital libraries + , Subject metadatas + , Text mining + , Wikipedia + , Data mining + , Data processing + , Digital libraries + , Genetic algorithms + , Websites + , Metadata +
Has keyword Genetic algorithms + , Keyphrase annotation + , Keyphrase indexing + , Metadata generation + , Scientific digital libraries + , Subject metadata + , Text mining + , Wikipedia +
Issn 1655515  +
Issue 3  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 410–426  +
Published in Journal of Information Science +
Title Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms +
Type journal article  +
Volume 39  +
Year 2013 +
Creation dateThis property is a special property in this wiki. 7 November 2014 09:45:00  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Journal articles  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 09:45:00  +
DateThis property is a special property in this wiki. 2013  +
hide properties that link here 
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms + Title
 

 

Enter the name of the page to start browsing from.