List of vandalism datasets
From WikiPapers
- See also: List of datasets, List of anti-vandalism tools.
This is a list of language datasets available in WikiPapers. Currently, there are 4 datasets of this type.
To create a new "dataset" go to Form:Dataset.
Datasets
Dataset | Size | Language | Description |
---|---|---|---|
PAN Wikipedia vandalism corpus 2010 | 447 MB | English | PAN Wikipedia vandalism corpus 2010 (PAN-WVC-10) is a corpus for the evaluation of automatic vandalism detectors for Wikipedia. |
PAN Wikipedia vandalism corpus 2011 | 370.8 MB | English German Spanish |
PAN Wikipedia vandalism corpus 2011 (PAN-WVC-11) is a corpus for the evaluation of automatic vandalism detectors for Wikipedia. |
Webis Wikipedia vandalism corpus | 10 KB | English | Webis Wikipedia vandalism corpus (Webis-WVC-07) is a corpus for the evaluation of automatic vandalism detection algorithms for Wikipedia. |
Wikipedia Vandalism Corpus (Andrew G. West) | 25.5 MB | English | Wikipedia Vandalism Corpus (Andrew G. West) is a corpus of 5.7 million automatically tagged and 5,000 manually-confirmed incidents of vandalism in English Wikipedia. |