BioSnowball: Automated population of wikis
|BioSnowball: Automated population of wikis|
|Author(s)||Liu X., Nie Z., Yu N., Wen J.-R.|
|Published in||Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|
|Keyword(s)||Bootstrapping, Fact extraction, Markov Logic Networks, Summarization (Extra: Bootstrapping, Decoupled methods, Empirical results, Fact extraction, First-stop, Inference process, Internet users, Labeled data, Markov logic networks, Neutral points, Statistical models, Summarization, Summarization models, Web presence, Web-scale datum, Wikipedia, Biographies)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
BioSnowball: Automated population of wikis is a 2010 conference paper written in English by Liu X., Nie Z., Yu N., Wen J.-R. and published in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Internet users regularly have the need to find biographies and facts of people of interest. Wikipedia has become the first stop for celebrity biographies and facts. However, Wiki-pedia can only provide information for celebrities because of its neutral point of view (NPOV) editorial policy. In this paper we propose an integrated bootstrapping framework named BioSnowball to automatically summarize the Web to generate Wikipedia-style pages for any person with a modest web presence. In BioSnowball, biography ranking and fact extraction are performed together in a single integrated training and inference process using Markov Logic Networks (MLNs) as its underlying statistical model. The bootstrapping framework starts with only a small number of seeds and iteratively finds new facts and biographies. As biography paragraphs on the Web are composed of the most important facts, our joint summarization model can improve the accuracy of both fact extraction and biography ranking compared to decoupled methods in the literature. Empirical results on both a small labeled data set and a real Web-scale data set show the effectiveness of BioSnowball. We also empirically show that BioSnowball outperforms the decoupled methods.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 4 time(s)