Extracting terminology from Wikipedia
|Extracting terminology from Wikipedia|
|Author(s)||Vivaldi J., Rodriguez H.|
|Published in||Procesamiento de Lenguaje Natural|
|Keyword(s)||Term extraction, Term recognition, Wikipedia|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of literature reviews|
In this paper we present a new approach for obtaining the terminology of a given domain using the category and page structures of the Wikipedia in a domain and language independent way. The idea is to take profit of category graph of Wikipedia starting with a set of categories that we associate with the domain. After obtaining the full set of categories belonging to the selected domain, the collection of corresponding pages is extracted, using some constraints. The set of titles of recovered pages and categories is selected as initial domain term vocabulary. The system has been evaluated substituting by it the term candidates analyzer module of an state-of-the-art term extractor, YATE. The results show that this resource may be used for this task overcoming some of the limitations of alternative knowledge sources. This approach has been applied to three domains (astronomy, chemistry, economics and medicine) and two languages (English and Spanish).
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 2 time(s)