Creating an extended named entity dictionary from wikipedia
|Creating an extended named entity dictionary from wikipedia|
|Author(s)||Higashinaka R., Tsu K.S., Saito K., Makino T., Matsuo Y.|
|Published in||24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers|
|Keyword(s)||Dictionary, Extended named entity, Wikipedia (Extra: Automatic method, Classification results, Entity-types, Information Extraction, Multi-class classifier, Named entities, Wikipedia, Computational linguistics, Glossaries, Websites)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Creating an extended named entity dictionary from wikipedia is a 2012 conference paper written in English by Higashinaka R., Tsu K.S., Saito K., Makino T., Matsuo Y. and published in 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers.
Automatic methods to create entity dictionaries or gazetteers have used only a small number of entity types (18 at maximum), which could pose a limitation for fine-grained information extraction. This paper aims to create a dictionary of 200 extended named entity (ENE) types. Using Wikipedia as a basic resource, we classify Wikipedia titles into ENE types to create an ENE dictionary. In our method, we derive a large number of features for Wikipedia titles and train a multiclass classifier by supervised learning. We devise an extensive list of features for the accurate classification into the ENE types, such as those related to the surface string of a title, the content of the article, and the meta data provided with Wikipedia. By experiments, we successfully show that it is possible to classify Wikipedia titles into ENE types with 79.63% accuracy. We applied our classifier to all Wikipedia titles and, by discarding low-confidence classification results, created an ENE dictionary of over one million entities covering 182 ENE types with an estimated accuracy of 89.48%. This is the first large scale ENE dictionary.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.