Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning
|Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning|
|Author(s)||Bing L., Lam W., Wong T.-L.|
|Published in||WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining|
|Keyword(s)||entity expansion, information extraction, proximate record graph, semi-supervised learning (Extra: Attribute extraction, Conditional random field, Data records, Different domains, Information Extraction, proximate record graph, Semi structured data, Semi-supervised learning, Unlabeled data, Wikipedia, Data mining, Expansion, Extraction, Information retrieval, Websites, Supervised learning)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning is a 2013 conference paper written in English by Bing L., Lam W., Wong T.-L. and published in WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining.
We develop a new framework to achieve the goal of Wikipedia entity expansion and attribute extraction from the Web. Our framework takes a few existing entities that are automatically collected from a particular Wikipedia category as seed input and explores their attribute infoboxes to obtain clues for the discovery of more entities for this category and the attribute content of the newly discovered entities. One characteristic of our framework is to conduct discovery and extraction from desirable semi-structured data record sets which are automatically collected from the Web. A semi-supervised learning model with Conditional Random Fields is developed to deal with the issues of extraction learning and limited number of labeled examples derived from the seed entities. We make use of a proximate record graph to guide the semi-supervised learning process. The graph captures alignment similarity among data records. Then the semi-supervised learning process can leverage the unlabeled data in the record set by controlling the label regularization under the guidance of the proximate record graph. Extensive experiments on different domains have been conducted to demonstrate its superiority for discovering new entities and extracting attribute content.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 3 time(s)