PAN Wikipedia quality flaw corpus 2012
From WikiPapers
| PAN Wikipedia quality flaw corpus 2012 (Alternative names for this dataset) | |
| Keyword(s) | quality |
| Size | 324 MB |
| Language(s) | English |
| Author(s) | Maik Anderka, Benno Stein |
| License(s) | Unknown [+] |
| Website | http://www.uni-weimar.de/medien/webis/research/events/pan-12/pan12-web/wikipedia-quality.html |
| Related material | |
| Related dataset(s) | PAN Wikipedia vandalism corpus 2010, PAN Wikipedia vandalism corpus 2011 |
| Related tool(s) | Unknown [+] |
| Search | |
| Google Scholar | |
| Export and share | |
| BibTeX, CSV, RDF, JSON | |
| | |
| Browse properties · List of datasets | |
PAN Wikipedia quality flaw corpus 2012 is an evaluation corpus for the "Quality Flaw Prediction in Wikipedia" task of the PAN 2012 Lab, held in conjunction with the CLEF 2012 conference.
Publications
| Title | Author(s) | Keyword(s) | Published in | Language | DateThis property is a special property in this wiki. | Abstract | R | C |
|---|---|---|---|---|---|---|---|---|
| FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia | Oliver Ferschke Iryna Gurevych Marc Rittberger |
PAN | English | 2012 | With over 23 million articles in 285 languages, Wikipedia is the largest free knowledge base on the web. Due to its open nature, everybody is allowed to access and edit the contents of this huge encyclopedia. As a downside of this open access policy, quality assessment of the content becomes a critical issue and is hardly manageable without computational assistance. In this paper, we present FlawFinder, a modular system for automatically predicting quality flaws in unseen Wikipedia articles. It competed in the inaugural edition of the Quality Flaw Prediction Task at the PAN Challenge 2012 and achieved the best precision of all systems and the second place in terms of recall and F1-score. | 0 | 0 | |
| On the Use of PU Learning for Quality Flaw Prediction in Wikipedia | Edgardo Ferretti Donato Hernández Fusilier Rafael Guzmán Cabrera Manuel Montes y Gómez Marcelo Errecalde Paolo Rosso |
PAN | English | 2012 | In this article we describe a new approach to assess Quality Flaw Prediction in Wikipedia. The partially supervised method studied, called PU Learning, has been successfully applied in classifications tasks with traditional corpora like Reuters-21578 or 20-Newsgroups. To the best of our knowledge, this is the first time that it is applied in this domain. Throughout this paper, we describe how the original PU Learning approach was evaluated for assessing quality flaws and the modifications introduced to get a quality flaws predictor which obtained the best F1 scores in the task “Quality Flaw Prediction in Wikipedia” of the PAN challenge. | 0 | 0 | |
| Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia | Maik Anderka Benno Stein |
Information quality Wikipedia Quality Flaw Prediction |
CLEF | English | 2012 | The paper overviews the task "Quality Flaw Prediction in Wikipedia" of the PAN'12 competition. An evaluation corpus is introduced which comprises 1,592,226 English Wikipedia articles, of which 208,228 have been tagged to contain one of ten important quality flaws. Moreover, the performance of three quality flaw classifiers is evaluated. | 0 | 0 |
