Crowdsourcing a Wikipedia Vandalism Corpus

From WikiPapers
Jump to: navigation, search

Crowdsourcing a Wikipedia Vandalism Corpus is a 2010 conference paper written in English by Martin Potthast and published in SIGIR.

[edit] Abstract

We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon’s Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as “regular” or “vandalism.” The corpus is available free of charge.

[edit] References

This publication has 6 references. Only those references related to wikis are included here:

  • "Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach" (create it!) [search]

Cited by

This publication has 1 citations. Only those publications available in WikiPapers are shown here:


No comments yet. Be first!