Vandalism
| vandalism (Alternative names for this keyword) | |
| Related keyword(s) | spam, robot, vandalism detection |
| Export and share | |
| BibTeX, CSV, RDF, JSON | |
| | |
| Browse properties · List of keywords | |
vandalism is included as keyword or extra keyword in 4 datasets, 11 tools and 12 publications.
Datasets
| Dataset | Size | Language | Description |
|---|---|---|---|
| PAN Wikipedia vandalism corpus 2010 | 447 MB | English | PAN Wikipedia vandalism corpus 2010 (PAN-WVC-10) is a corpus for the evaluation of automatic vandalism detectors for Wikipedia. |
| PAN Wikipedia vandalism corpus 2011 | 370.8 MB | English German Spanish |
PAN Wikipedia vandalism corpus 2011 (PAN-WVC-11) is a corpus for the evaluation of automatic vandalism detectors for Wikipedia. |
| Webis Wikipedia vandalism corpus | 10 KB | English | Webis Wikipedia vandalism corpus (Webis-WVC-07) is a corpus for the evaluation of automatic vandalism detection algorithms for Wikipedia. |
| Wikipedia Vandalism Corpus (Andrew G. West) | 25.5 MB | English | Wikipedia Vandalism Corpus (Andrew G. West) is a corpus of 5.7 million automatically tagged and 5,000 manually-confirmed incidents of vandalism in English Wikipedia. |
Tools
| Tool | Operating System(s) | Language(s) | Programming language(s) | License | Description | Image |
|---|---|---|---|---|---|---|
| AVBOT | Cross-platform | English Spanish |
Python | GPL | AVBOT is an anti-vandalism bot in Spanish Wikipedia. It uses regular expressions and scores to detect vandalism. | |
| ClueBot | GNU/Linux | C C++ Python PHP Bash |
ClueBot is an anti-vandalism bot in English Wikipedia. | |||
| CryptoDerk's Vandal Fighter | Cross-platform | English | Java | Open source | ||
| Huggle | Windows | Multilingual | Visual Basic .NET | GPL v3 | ||
| Igloo | Cross-platform | JavaScript | Open source | |||
| STiki | Cross-platform | English | Java | GPL | STiki is an anti-vandalism tool that consists of server-side detection algorithms and a client-facing GUI. | |
| Salebot | Salebot is an anti-vandalism bot in French Wikipedia. | |||||
| Twinkle | Cross-platform | English | JavaScript | |||
| Vandal Fighter | Cross-platform | English | Java | |||
| VandalProof | Windows | English | Visual Basic | |||
| VandalSniper | Cross-platform | English | Mono |
Publications
| Title | Author(s) | Published in | Language | DateThis property is a special property in this wiki. | Abstract | R | C |
|---|---|---|---|---|---|---|---|
| Etiquette in Wikipedia: Weening New Editors into Productive Ones | Ryan Faulkner Steven Walling Maryana Pinchuk |
WikiSym | English | August 2012 | Currently, the greatest challenge faced by the Wikipedia community involves reversing the decline of active editors on the site – in other words, ensuring that the encyclopedia’s contributors remain sufficiently numerous to fill the roles that keep it relevant. Due to the natural drop-off of old contributors, newcomers must constantly be socialized, trained and retained. However recent research has shown the Wikipedia community is failing to retain a large proportion of productive new contributors and implicates Wikipedia’s semi-automated quality control mechanisms and their interactions with these newcomers as an exacerbating factor. This paper evaluates the effectiveness of minor changes to the normative warning messages sent to newcomers from one of the most prolific of these quality control tools (Huggle) in preserving their rate of contribution. The experimental results suggest that substantial gains in newcomer participation can be attained through inexpensive changes to the wording of the first normative message that new contributors receive. | 0 | 0 |
| Automatic Vandalism Detection in Wikipedia with Active Associative Classification | Maria Sumbana Marcos André Gonçalves Rodrigo Silva Jussara Almeida Adriano Veloso |
Lecture Notes in Computer Science | 2012 | Wikipedia and other free editing services for collaboratively generated content have quickly grown in popularity. However, the lack of editing control has made these services vulnerable to various types of malicious actions such as vandalism. State-of-the-art vandalism detection methods are based on supervised techniques, thus relying on the availability of large and representative training collections. Building such collections, often with the help of crowdsourcing, is very costly due to a natural skew towards very few vandalism examples in the available data as well as dynamic patterns. Aiming at reducing the cost of building such collections, we present a new active sampling technique coupled with an on-demand associative classification algorithm for Wikipedia vandalism detection. We show that our classifier enhanced with a simple undersampling technique for building the training set outperforms state-of-the-art classifiers such as SVMs and kNNs. Furthermore, by applying active sampling, we are able to reduce the need for training in almost 96% with only a small impact on detection results. | 0 | 0 | |
| Multilingual Vandalism Detection using Language-Independent & Ex Post Facto Evidence | Andrew G. West Insup Lee |
PAN-CLEF | English | September 2011 | There is much literature on Wikipedia vandalism detection. However, this writing addresses two facets given little treatment to date. First, prior efforts emphasize zero-delay detection, classifying edits the moment they are made. If classification can be delayed (e.g., compiling offline distributions), it is possible to leverage ex post facto evidence. This work describes/evaluates several features of this type, which we find to be overwhelmingly strong vandalism indicators.
Second, English Wikipedia has been the primary test-bed for research. Yet, Wikipedia has 200+ language editions and use of localized features impairs portability. This work implements an extensive set of language-independent indicators and evaluates them using three corpora (German, English, Spanish). The work then extends to include language-specific signals. Quantifying their performance benefit, we find that such features can moderately increase classifier accuracy, but significant effort and language fluency are required to capture this utility. Aside from these novel aspects, this effort also broadly addresses the task, implementing 65 total features. Evaluation produces 0.840 PR-AUC on thezero-delay task and 0.906 PR-AUC with ex post facto evidence (averaging languages). Performance matches the state-of-the-art (English), sets novel baselines (German, Spanish), and is validated by a first-place finish over the 2011 PAN-CLEF test set. |
0 | 0 |
| Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features | B. Thomas Adler Luca de Alfaro Santiago M. Mola Velasco Paolo Rosso Andrew G. West |
Lecture notes in computer science | English | February 2011 | Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions. | 0 | 1 |
| AVBOT: Detecting and fixing vandalism in Wikipedia | Emilio J. Rodríguez-Posada | UPGRADE | English | 2011 | Wikipedia is a project which aims to build a free encyclopaedia to spread the sum of all knowledge to every single human being. Today it can be said to be on the road to achieving that goal, having reached the 15 million articles milestone in 270 languages. Furthermore, if we include its sister projects (Wiktionary, Wikibooks, Wikisource,...), it has received more than 1 billion edits in 10 years and now has more than 10 billion page views every month. Compiling an encyclopaedia in a collaborative way has been possible thanks to MediaWiki software. It allows everybody to modify the content available on the site easily. But a problem emerges regarding this model: not all edits are made in good faith. AVBOT is a bot for protecting the Spanish Wikipedia against some undesired modifications known as vandalism. Although AVBOT was developed for Wikipedia, it can be used on any MediaWiki website. It is developed in Python and is free software. In the 2 years it has been in operation it has reverted more than 200,000 vandalism edits, while several clones have been executed, adding thousands of reverts to this count. | 0 | 0 |
| Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata | Andrew G. West Sampath Kannan Insup Lee |
EUROSEC | English | April 2010 | Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with nonoffending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set. | 9 | 3 |
| Crowdsourcing a Wikipedia Vandalism Corpus | Martin Potthast | SIGIR | English | 2010 | We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon’s Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as “regular” or “vandalism.” The corpus is available free of charge. | 6 | 1 |
| The work of sustaining order in Wikipedia: the banning of a vandal | R. Stuart Geiger David Ribes |
English | 2010 | In this paper, we examine the social roles of software tools in the English-language Wikipedia, specifically focusing on autonomous editing programs and assisted editing tools. This qualitative research builds on recent research in which we quantitatively demonstrate the growing prevalence of such software in recent years. Using trace ethnography, we show how these often-unofficial technologies have fundamentally transformed the nature of editing and administration in Wikipedia. Specifically, we analyze "vandal fighting" as an epistemic process of distributed cognition, highlighting the role of non-human actors in enabling a decentralized activity of collective intelligence. In all, this case shows that software programs are used for more than enforcing policies and standards. These tools enable coordinated yet decentralized action, independent of the specific norms currently in force. | 0 | 2 | |
| Wiki Vandalysis - Wikipedia Vandalism Analysis | Manoj Harpalani Thanadit Phumprao Megha Bassi Michael Hart Rob Johnson |
CLEF | English | 2010 | Wikipedia describes itself as the "free encyclopedia that anyone can edit". Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Deterring and reverting vandalism has become one of the
major challenges of Wikipedia as its size grows. Wikipedia editors fight vandalism both manually and with automated bots that use regular expressions and other simple rules to recognize malicious edits. Researchers have also proposed Machine Learning algorithms for vandalism detection, but these algorithms are still in their infancy and have much room for improvement. This paper presents an approach to fighting vandalism by extracting various features from the edits for machine learning classification. Our classifier uses information about the editor, the sentiment of the edit, the "quality" of the edit (i.e. spelling errors), and targeted regular expressions to capture patterns common in blatant vandalism, such as insertion of obscene words or multiple exclamations. We have successfully been able to achieve an area under the ROC curve (AUC) of 0.91 on a training set of 15000 human annotated edits and 0.887 on a random sample of 17472 edits from 317443. |
0 | 0 |
| Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals | Santiago M. Mola Velasco | CLEF | English | 2010 | Wikipedia is an online encyclopedia that anyone can edit. In this open model, some people edits with the intent of harming the integrity of Wikipedia. This is known as vandalism. We extend the framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection. In this approach, several vandalism indicating features are extracted from edits in a vandalism corpus and are fed to a supervised learning algorithm. The best performing classifiers were LogitBoost and Random Forest. Our classifier, a Random Forest, obtained an AUC of 0.92236, ranking in the first place of the PAN’10 Wikipedia vandalism detection task. | 4 | 0 |
| Detector y corrector automático de ediciones maliciosas en Wikipedia | Emilio J. Rodríguez-Posada | Spanish | 2009 | El proyecto desarrolla AVBOT (acrónimo de Anti-Vandalism BOT), un programa que detecta y corrige automáticamente ediciones maliciosas en Wikipedia en español. Está programado en Python y utiliza las librerías pywikipediabot y python-irclib. | 0 | 0 | |
| ClueBot and Vandalism in Wikipedia | Jacobi Carter | English | 2008 | 0 | 0 |
- See also: List of anti-vandalism tools.
- http://en.wikipedia.org/wiki/Wikipedia:Vandalism
- http://en.wikipedia.org/wiki/User:Emijrp/Anti-vandalism_bot_census
- http://en.wikipedia.org/wiki/Category:Wikipedia_anti-vandal_bots
- http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Vandalism_studies
- Wmcharts reverts stats http://toolserver.org/~emijrp/wmcharts/wmchart0008.html
- http://countervandalism.net
