|Building distant supervised relation extractors|
|Author(s)||Nunes T., Schwabe D.|
|Published in||Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014|
|Keyword(s)||DBpedia, Distant Supervision, Information Extraction, Relation Extraction, Wikipedia (Extra: Artificial intelligence, Information retrieval, Semantics, Automatic approaches, Dbpedia, Distant Supervision, Logistic regressions, Portuguese languages, Relation extraction, State-of-the-art approach, Wikipedia, Semantic Web)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Building distant supervised relation extractors is a 2014 conference paper written in English by Nunes T., Schwabe D. and published in Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014.
A well-known drawback in building machine learning semantic relation detectors for natural language is the lack of a large number of qualified training instances for the target relations in multiple languages. Even when good results are achieved, the datasets used by the state-of-the-art approaches are rarely published. In order to address these problems, this work presents an automatic approach to build multilingual semantic relation detectors through distant supervision combining two of the largest resources of structured and unstructured content available on the Web, DBpedia and Wikipedia. We map the DBpedia ontology back to the Wikipedia text to extract more than 100.000 training instances for more than 90 DBpedia relations for English and Portuguese languages without human intervention. First, we mine the Wikipedia articles to find candidate instances for relations described in the DBpedia ontology. Second, we preprocess and normalize the data filtering out irrelevant instances. Finally, we use the normalized data to construct regularized logistic regression detectors that achieve more than 80% of F-Measure for both English and Portuguese languages. In this paper, we also compare the impact of different types of features on the accuracy of the trained detector, demonstrating significant performance improvements when combining lexical, syntactic and semantic features. Both the datasets and the code used in this research are available online.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.