Using linked data to mine RDF from Wikipedia's tables
|Using linked data to mine RDF from Wikipedia's tables|
|Author(s)||Munoz E., Hogan A., Mileo A.|
|Published in||WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining|
|Keyword(s)||data mining, linked data, web tables, wikipedia (Extra: Artificial intelligence, Data handling, Data mining, Information retrieval, Learning systems, Semantics, Extraction phase, Knowledge base, Linked datum, Machine learning methods, Semi-structured, Web tables, Wikipedia, Wikipedia articles, Websites)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Using linked data to mine RDF from Wikipedia's tables is a 2014 conference paper written in English by Munoz E., Hogan A., Mileo A. and published in WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining.
The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.