Browse wiki

Jump to: navigation, search
Classification of scientific publications according to library controlled vocabularies: A new concept matching-based approach
Abstract Purpose: This paper aims to report on the Purpose: This paper aims to report on the design and development of a new approach for automatic classification and subject indexing of research documents in scientific digital libraries and repositories (DLR) according to library controlled vocabularies such as DDC and FAST. Design/methodology/approach: The proposed concept matching-based approach (CMA) detects key Wikipedia concepts occurring in a document and searches the OPACs of conventional libraries via querying the WorldCat database to retrieve a set of MARC records which share one or more of the detected key concepts. Then the semantic similarity of each retrieved MARC record to the document is measured and, using an inference algorithm, the DDC classes and FAST subjects of those MARC records which have the highest similarity to the document are assigned to it. Findings: The performance of the proposed method in terms of the accuracy of the DDC classes and FAST subjects automatically assigned to a set of research documents is evaluated using standard information retrieval measures of precision, recall, and F1. The authors demonstrate the superiority of the proposed approach in terms of accuracy performance in comparison to a similar system currently deployed in a large scale scientific search engine. Originality/value: The proposed approach enables the development of a new type of subject classification system for DLR, and addresses some of the problems similar systems suffer from, such as the problem of imbalanced training data encountered by machine learning-based systems, and the problem of word-sense ambiguity encountered by string matching-based systems.ountered by string matching-based systems.
Abstractsub Purpose: This paper aims to report on the Purpose: This paper aims to report on the design and development of a new approach for automatic classification and subject indexing of research documents in scientific digital libraries and repositories (DLR) according to library controlled vocabularies such as DDC and FAST. Design/methodology/approach: The proposed concept matching-based approach (CMA) detects key Wikipedia concepts occurring in a document and searches the OPACs of conventional libraries via querying the WorldCat database to retrieve a set of MARC records which share one or more of the detected key concepts. Then the semantic similarity of each retrieved MARC record to the document is measured and, using an inference algorithm, the DDC classes and FAST subjects of those MARC records which have the highest similarity to the document are assigned to it. Findings: The performance of the proposed method in terms of the accuracy of the DDC classes and FAST subjects automatically assigned to a set of research documents is evaluated using standard information retrieval measures of precision, recall, and F1. The authors demonstrate the superiority of the proposed approach in terms of accuracy performance in comparison to a similar system currently deployed in a large scale scientific search engine. Originality/value: The proposed approach enables the development of a new type of subject classification system for DLR, and addresses some of the problems similar systems suffer from, such as the problem of imbalanced training data encountered by machine learning-based systems, and the problem of word-sense ambiguity encountered by string matching-based systems.ountered by string matching-based systems.
Bibtextype article  +
Doi 10.1108/LHT-03-2013-0030  +
Has author Joorabchi A. + , Mahdi A.E. +
Has keyword Automatic classification + , Concept matching + , Dewey Decimal Classification (DDC) + , FAST subject headings + , Information retrieval + , Libraries + , Metadata generation + , Scientific digital libraries and repositories + , Subject indexing + , Subject metadata + , Wikipedia + , WorldCat +
Issn 7378831  +
Issue 4  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 725–747  +
Published in Library Hi Tech +
Title Classification of scientific publications according to library controlled vocabularies: A new concept matching-based approach +
Type journal article  +
Volume 31  +
Year 2013 +
Creation dateThis property is a special property in this wiki. 6 November 2014 16:57:18  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Journal articles  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 6 November 2014 16:57:18  +
DateThis property is a special property in this wiki. 2013  +
hide properties that link here 
Classification of scientific publications according to library controlled vocabularies: A new concept matching-based approach + Title
 

 

Enter the name of the page to start browsing from.