Information extraction from Wikipedia using pattern learning
|Information extraction from Wikipedia using pattern learning|
|Published in||Acta Cybernetica|
|Keyword(s)||Information extraction, Machine learning, Natural language processing (Extra: Custom solutions, Extraction patterns, Human knowledge, Information Extraction, Labeled data, Labeled training data, Linguistic processing, Machine learning, Machine learning methods, Named entity recognition, NAtural language processing, Pattern Learning, Semantic resources, Semantic Web technology, Structured information, Verb frames, Wikipedia, Computational linguistics, Information analysis, Learning algorithms, Learning systems, Semantic Web, Semantics, Natural language processing systems)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
In this paper we present solutions for the crucial task of extracting structured information from massive free-text resources, such as Wikipedia, for the sake of semantic databases serving upcoming Semantic Web technologies. We demonstrate both a verb frame-based approach using deep natural language processing techniques with extraction patterns developed by human knowledge experts and machine learning methods using shallow linguistic processing. We also propose a method for learning verb frame-based extraction patterns automatically from labeled data. We show that labeled training data can be produced with only minimal human effort by utilizing existing semantic resources and the special characteristics of Wikipedia. Custom solutions for named entity recognition are also possible in this scenario. We present evaluation and comparison of the different approaches for several different relations.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.