Browse wiki

Jump to: navigation, search
Multilingual Vandalism Detection using Language-Independent & Ex Post Facto Evidence
Abstract There is much literature on Wikipedia vandThere is much literature on Wikipedia vandalism detection. However, this writing addresses two facets given little treatment to date. First, prior efforts emphasize zero-delay detection, classifying edits the moment they are made. If classification can be delayed (e.g., compiling offline distributions), it is possible to leverage ex post facto evidence. This work describes/evaluates several features of this type, which we find to be overwhelmingly strong vandalism indicators. Second, English Wikipedia has been the primary test-bed for research. Yet, Wikipedia has 200+ language editions and use of localized features impairs portability. This work implements an extensive set of language-independent indicators and evaluates them using three corpora (German, English, Spanish). The work then extends to include language-specific signals. Quantifying their performance benefit, we find that such features can moderately increase classifier accuracy, but significant effort and language fluency are required to capture this utility. Aside from these novel aspects, this effort also broadly addresses the task, implementing 65 total features. Evaluation produces 0.840 PR-AUC on thezero-delay task and 0.906 PR-AUC with ex post facto evidence (averaging languages). Performance matches the state-of-the-art (English), sets novel baselines (German, Spanish), and is validated by a first-place finish over the 2011 PAN-CLEF test set.ce finish over the 2011 PAN-CLEF test set.
Abstractsub There is much literature on Wikipedia vandThere is much literature on Wikipedia vandalism detection. However, this writing addresses two facets given little treatment to date. First, prior efforts emphasize zero-delay detection, classifying edits the moment they are made. If classification can be delayed (e.g., compiling offline distributions), it is possible to leverage ex post facto evidence. This work describes/evaluates several features of this type, which we find to be overwhelmingly strong vandalism indicators. Second, English Wikipedia has been the primary test-bed for research. Yet, Wikipedia has 200+ language editions and use of localized features impairs portability. This work implements an extensive set of language-independent indicators and evaluates them using three corpora (German, English, Spanish). The work then extends to include language-specific signals. Quantifying their performance benefit, we find that such features can moderately increase classifier accuracy, but significant effort and language fluency are required to capture this utility. Aside from these novel aspects, this effort also broadly addresses the task, implementing 65 total features. Evaluation produces 0.840 PR-AUC on thezero-delay task and 0.906 PR-AUC with ex post facto evidence (averaging languages). Performance matches the state-of-the-art (English), sets novel baselines (German, Spanish), and is validated by a first-place finish over the 2011 PAN-CLEF test set.ce finish over the 2011 PAN-CLEF test set.
Bibtextype inproceedings  +
Has author Andrew G. West + , Insup Lee +
Has extra keyword Vandalism detection + , Vandalism + , Machine learning +
Has remote mirror http://repository.upenn.edu/cis_papers/479/  +
Has slides http://www.andrew-g-west.com/docs/pan_11_slides.pdf  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Peer-reviewed Yes  +
Published in PAN-CLEF +
Related tool STiki + , WikiTrust +
Title Multilingual Vandalism Detection using Language-Independent & Ex Post Facto Evidence +
Type conference paper  +
Year 2011 +
Creation dateThis property is a special property in this wiki. 12 February 2012 20:00:20  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without DOI parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 June 2013 04:55:54  +
DateThis property is a special property in this wiki. September 2011  +
hide properties that link here 
Multilingual Vandalism Detection using Language-Independent & Ex Post Facto Evidence + Title
 

 

Enter the name of the page to start browsing from.