List of vandalism datasets

From WikiPapers
Jump to: navigation, search
See also: List of datasets, List of anti-vandalism tools.

This is a list of language datasets available in WikiPapers. Currently, there are 4 datasets of this type.

To create a new "dataset" go to Form:Dataset.

Datasets

Dataset Size Language Description
PAN Wikipedia vandalism corpus 2010 447 MB English PAN Wikipedia vandalism corpus 2010 (PAN-WVC-10) is a corpus for the evaluation of automatic vandalism detectors for Wikipedia.
PAN Wikipedia vandalism corpus 2011 370.8 MB English
German
Spanish
PAN Wikipedia vandalism corpus 2011 (PAN-WVC-11) is a corpus for the evaluation of automatic vandalism detectors for Wikipedia.
Webis Wikipedia vandalism corpus 10 KB English Webis Wikipedia vandalism corpus (Webis-WVC-07) is a corpus for the evaluation of automatic vandalism detection algorithms for Wikipedia.
Wikipedia Vandalism Corpus (Andrew G. West) 25.5 MB English Wikipedia Vandalism Corpus (Andrew G. West) is a corpus of 5.7 million automatically tagged and 5,000 manually-confirmed incidents of vandalism in English Wikipedia.