Information extraction from Wikipedia: Moving down the long tail
|Information extraction from Wikipedia: Moving down the long tail|
|Author(s)||Wu F., Hoffmann R., Weld D.S.|
|Published in||Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|
|Keyword(s)||Information extraction, Semantic Web, Wikipedia (Extra: Design variations, High precision, Information extraction, Internal structures, Long tails, Novel techniques, Quality informations, Retraining techniques, Training datum, Wikipedia, Information management, Information theory, Metal recovery, Mining, Semantic Web, Semantics, Taxonomies, Data mining)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Information extraction from Wikipedia: Moving down the long tail is a 2008 conference paper written in English by Wu F., Hoffmann R., Weld D.S. and published in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia's long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 39 time(s)