Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools
|Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools|
|Author(s)||Lopuszynski M., Bolikowski L.|
|Published in||Communications in Computer and Information Science|
|Keyword(s)||Natural language processing, Tagging document collections, Wikipedia (Extra: Artificial intelligence, Digital libraries, Natural language processing systems, Document collection, Document similarity, NAtural language processing, Natural Language Processing Tools, Scientific documents, Scientific publications, Statistical properties, Wikipedia, Learning algorithms)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools is a 2014 conference paper written in English by Lopuszynski M., Bolikowski L. and published in Communications in Computer and Information Science.
In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.).
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.