Vandalism

From WikiPapers
(Redirected from Vandals)
Jump to: navigation, search

vandalism is included as keyword or extra keyword in 4 datasets, 11 tools and 18 publications.

Datasets

Dataset Size Language Description
PAN Wikipedia vandalism corpus 2010 447 MB English PAN Wikipedia vandalism corpus 2010 (PAN-WVC-10) is a corpus for the evaluation of automatic vandalism detectors for Wikipedia.
PAN Wikipedia vandalism corpus 2011 370.8 MB English
German
Spanish
PAN Wikipedia vandalism corpus 2011 (PAN-WVC-11) is a corpus for the evaluation of automatic vandalism detectors for Wikipedia.
Webis Wikipedia vandalism corpus 10 KB English Webis Wikipedia vandalism corpus (Webis-WVC-07) is a corpus for the evaluation of automatic vandalism detection algorithms for Wikipedia.
Wikipedia Vandalism Corpus (Andrew G. West) 25.5 MB English Wikipedia Vandalism Corpus (Andrew G. West) is a corpus of 5.7 million automatically tagged and 5,000 manually-confirmed incidents of vandalism in English Wikipedia.

Tools

Tool Operating System(s) Language(s) Programming language(s) License Description Image
AVBOT Cross-platform English
Spanish
Python GPL AVBOT is an anti-vandalism bot in Spanish Wikipedia. It uses regular expressions and scores to detect vandalism. Avbot logo.png
ClueBot GNU/Linux C
C++
Python
PHP
Bash
ClueBot is an anti-vandalism bot in English Wikipedia.
CryptoDerk's Vandal Fighter Cross-platform English Java Open source
Huggle Windows Visual Basic .NET GPL v3
Igloo Cross-platform JavaScript Open source
STiki Cross-platform English Java GPL STiki is an anti-vandalism tool that consists of server-side detection algorithms and a client-facing GUI. STiki logo.png
Salebot Salebot is an anti-vandalism bot in French Wikipedia.
Twinkle Cross-platform English JavaScript
Vandal Fighter Cross-platform English Java Vandal Fighter - Live RC.png
VandalProof Windows English Visual Basic
VandalSniper Cross-platform English Mono


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
From open-source software to Wikipedia: 'Backgrounding' trust by collective monitoring and reputation tracking De Laat P.B. Ethics and Information Technology English 2014 Open-content communities that focus on co-creation without requirements for entry have to face the issue of institutional trust in contributors. This research investigates the various ways in which these communities manage this issue. It is shown that communities of open-source software-continue to-rely mainly on hierarchy (reserving write-access for higher echelons), which substitutes (the need for) trust. Encyclopedic communities, though, largely avoid this solution. In the particular case of Wikipedia, which is confronted with persistent vandalism, another arrangement has been pioneered instead. Trust (i.e. full write-access) is 'backgrounded' by means of a permanent mobilization of Wikipedians to monitor incoming edits. Computational approaches have been developed for the purpose, yielding both sophisticated monitoring tools that are used by human patrollers, and bots that operate autonomously. Measures of reputation are also under investigation within Wikipedia; their incorporation in monitoring efforts, as an indicator of the trustworthiness of editors, is envisaged. These collective monitoring efforts are interpreted as focusing on avoiding possible damage being inflicted on Wikipedian spaces, thereby being allowed to keep the discretionary powers of editing intact for all users. Further, the essential differences between backgrounding and substituting trust are elaborated. Finally it is argued that the Wikipedian monitoring of new edits, especially by its heavy reliance on computational tools, raises a number of moral questions that need to be answered urgently. 0 0
A game theoretic analysis of collaboration in Wikipedia Anand S.
Ofer Arazy
Mandayam N.B.
Oded Nov
Lecture Notes in Computer Science English 2013 Peer production projects such as Wikipedia or open-source software development allow volunteers to collectively create knowledge-based products. The inclusive nature of such projects poses difficult challenges for ensuring trustworthiness and combating vandalism. Prior studies in the area deal with descriptive aspects of peer production, failing to capture the idea that while contributors collaborate, they also compete for status in the community and for imposing their views on the product. In this paper, we investigate collaborative authoring in Wikipedia, where contributors append and overwrite previous contributions to a page. We assume that a contributor's goal is to maximize ownership of content sections, such that content owned (i.e. originated) by her survived the most recent revision of the page.We model contributors' interactions to increase their content ownership as a non-cooperative game, where a player's utility is associated with content owned and cost is a function of effort expended. Our results capture several real-life aspects of contributors interactions within peer-production projects. Namely, we show that at the Nash equilibrium there is an inverse relationship between the effort required to make a contribution and the survival of a contributor's content. In other words, majority of the content that survives is necessarily contributed by experts who expend relatively less effort than non-experts. An empirical analysis of Wikipedia articles provides support for our model's predictions. Implications for research and practice are discussed in the context of trustworthy collaboration as well as vandalism. 0 0
Assessing quality score of wikipedia articles using mutual evaluation of editors and texts Yu Suzuki
Masatoshi Yoshikawa
International Conference on Information and Knowledge Management, Proceedings English 2013 In this paper, we propose a method for assessing quality scores of Wikipedia articles by mutually evaluating editors and texts. Survival ratio based approach is a major approach to assessing article quality. In this approach, when a text survives beyond multiple edits, the text is assessed as good quality, because poor quality texts have a high probability of being deleted by editors. However, many vandals, low quality editors, delete good quality texts frequently, which improperly decreases the survival ratios of good quality texts. As a result, many good quality texts are unfairly assessed as poor quality. In our method, we consider editor quality score for calculating text quality score, and decrease the impact on text quality by vandals. Using this improvement, the accuracy of the text quality score should be improved. However, an inherent problem with this idea is that the editor quality scores are calculated by the text quality scores. To solve this problem, we mutually calculate the editor and text quality scores until they converge. In this paper, we prove that the text quality score converges. We did our experimental evaluation, and confirmed that our proposed method could accurately assess the text quality scores. Copyright is held by the owner/author(s). 0 0
Etiquette in Wikipedia: Weening New Editors into Productive Ones Ryan Faulkner
Steven Walling
Maryana Pinchuk
WikiSym English August 2012 Currently, the greatest challenge faced by the Wikipedia community involves reversing the decline of active editors on the site – in other words, ensuring that the encyclopedia’s contributors remain sufficiently numerous to fill the roles that keep it relevant. Due to the natural drop-off of old contributors, newcomers must constantly be socialized, trained and retained. However recent research has shown the Wikipedia community is failing to retain a large proportion of productive new contributors and implicates Wikipedia’s semi-automated quality control mechanisms and their interactions with these newcomers as an exacerbating factor. This paper evaluates the effectiveness of minor changes to the normative warning messages sent to newcomers from one of the most prolific of these quality control tools (Huggle) in preserving their rate of contribution. The experimental results suggest that substantial gains in newcomer participation can be attained through inexpensive changes to the wording of the first normative message that new contributors receive. 0 1
Coercion or empowerment? Moderation of content in Wikipedia as 'essentially contested' bureaucratic rules De Laat P.B. Ethics and Information Technology English 2012 In communities of user-generated content, systems for the management of content and/or their contributors are usually accepted without much protest. Not so, however, in the case of Wikipedia, in which the proposal to introduce a system of review for new edits (in order to counter vandalism) led to heated discussions. This debate is analysed, and arguments of both supporters and opponents (of English, German and French tongue) are extracted from Wikipedian archives. In order to better understand this division of the minds, an analogy is drawn with theories of bureaucracy as developed for real-life organizations. From these it transpires that bureaucratic rules may be perceived as springing from either a control logic or an enabling logic. In Wikipedia, then, both perceptions were at work, depending on the underlying views of participants. Wikipedians either rejected the proposed scheme (because it is antithetical to their conception of Wikipedia as a community) or endorsed it (because it is consonant with their conception of Wikipedia as an organization with clearly defined boundaries). Are other open-content communities susceptible to the same kind of 'essential contestation'?. 0 0
Etiquette in Wikipedia: Weening new editors into productive ones Ryan Faulkner
Steven Walling
Maryana Pinchuk
WikiSym 2012 English 2012 Currently, the greatest challenge faced by the Wikipedia community involves reversing the decline of active editors on the site - in other words, ensuring that the encyclopedia's contributors remain sufficiently numerous to fill the roles that keep it relevant. Due to the natural drop-off of old contributors, newcomers must constantly be socialized, trained and retained. However recent research has shown the Wikipedia community is failing to retain a large proportion of productive new contributors and implicates Wikipedia's semi-automated quality control mechanisms and their interactions with these newcomers as an exacerbating factor. This paper evaluates the effectiveness of minor changes to the normative warning messages sent to newcomers from one of the most prolific of these quality control tools (Huggle) in preserving their rate of contribution. The experimental results suggest that substantial gains in newcomer participation can be attained through inexpensive changes to the wording of the first normative message that new contributors receive. 0 1
Feature transformation method enhanced vandalism detection in wikipedia Chang T.
Hong Lin
Yi-Sheng Lin
Lecture Notes in Computer Science English 2012 A very example of web 2.0 application is Wikipedia, an online encyclopedia where anyone can edit and share information. However, blatantly unproductive edits greatly undermine the quality of Wikipedia. Their irresponsible acts force editors to waste time undoing vandalisms. For the purpose of improving information quality on Wikipedia and freeing the maintainer from such repetitive tasks, machine learning methods have been proposed to detect vandalism automatically. However, most of them focused on mining new features which seem to be inexhaustible to be discovered. Therefore, the question of how to make the best use of these features needs to be tackled. In this paper, we leverage feature transformation techniques to analyze the features and propose a framework using these methods to enhance detection. Experiment results on the public dataset PAN-WVC-10 show that our method is effective and it provides another useful method to help detect vandalism in Wikipedia. 0 0
Multilingual Vandalism Detection using Language-Independent & Ex Post Facto Evidence Andrew G. West
Insup Lee
PAN-CLEF English September 2011 There is much literature on Wikipedia vandalism detection. However, this writing addresses two facets given little treatment to date. First, prior efforts emphasize zero-delay detection, classifying edits the moment they are made. If classification can be delayed (e.g., compiling offline distributions), it is possible to leverage ex post facto evidence. This work describes/evaluates several features of this type, which we find to be overwhelmingly strong vandalism indicators.

Second, English Wikipedia has been the primary test-bed for research. Yet, Wikipedia has 200+ language editions and use of localized features impairs portability. This work implements an extensive set of language-independent indicators and evaluates them using three corpora (German, English, Spanish). The work then extends to include language-specific signals. Quantifying their performance benefit, we find that such features can moderately increase classifier accuracy, but significant effort and language fluency are required to capture this utility.

Aside from these novel aspects, this effort also broadly addresses the task, implementing 65 total features. Evaluation produces 0.840 PR-AUC on thezero-delay task and 0.906 PR-AUC with ex post facto evidence (averaging languages). Performance matches the state-of-the-art (English), sets novel baselines (German, Spanish), and is validated by a first-place finish over the 2011 PAN-CLEF test set.
0 0
Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features B. Thomas Adler
Luca de Alfaro
Santiago M. Mola Velasco
Paolo Rosso
Andrew G. West
Lecture Notes in Computer Science English February 2011 Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions. 0 1
AVBOT: Detecting and fixing vandalism in Wikipedia Emilio J. Rodríguez-Posada UPGRADE English 2011 Wikipedia is a project which aims to build a free encyclopaedia to spread the sum of all knowledge to every single human being. Today it can be said to be on the road to achieving that goal, having reached the 15 million articles milestone in 270 languages. Furthermore, if we include its sister projects (Wiktionary, Wikibooks, Wikisource,...), it has received more than 1 billion edits in 10 years and now has more than 10 billion page views every month. Compiling an encyclopaedia in a collaborative way has been possible thanks to MediaWiki software. It allows everybody to modify the content available on the site easily. But a problem emerges regarding this model: not all edits are made in good faith. AVBOT is a bot for protecting the Spanish Wikipedia against some undesired modifications known as vandalism. Although AVBOT was developed for Wikipedia, it can be used on any MediaWiki website. It is developed in Python and is free software. In the 2 years it has been in operation it has reverted more than 200,000 vandalism edits, while several clones have been executed, adding thousands of reverts to this count. 0 0
Vandalism and conflict resolution in wikipedia. An empirical analysis on how a large-scale web-based community deals with breaches of the online peace Roessing T. Proceedings of the IADIS International Conferences - Web Based Communities and Social Media 2011, Social Media 2011, Internet Applications and Research 2011, Part of the IADIS, MCCSIS 2011 English 2011 The paper discusses the proceedings on the anti-vandalism page of the German language version of the online encyclopedia Wikipedia. Research questions address the structure of vandalism reports, the distribution over time of day and the relationship between conflict potential and conflict resolution. A quantitative analysis of 500 vandalism reports reveals that the anti-vandalism page is a good indicator for conflicts within the community and its deficits in dealing with them. 0 0
Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata Andrew G. West
Sampath Kannan
Insup Lee
EUROSEC English April 2010 Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with nonoffending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set. 9 3
Crowdsourcing a Wikipedia Vandalism Corpus Martin Potthast SIGIR English 2010 We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon’s Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as “regular” or “vandalism.” The corpus is available free of charge. 6 1
The work of sustaining order in Wikipedia: the banning of a vandal R. Stuart Geiger
David Ribes
English 2010 In this paper, we examine the social roles of software tools in the English-language Wikipedia, specifically focusing on autonomous editing programs and assisted editing tools. This qualitative research builds on recent research in which we quantitatively demonstrate the growing prevalence of such software in recent years. Using trace ethnography, we show how these often-unofficial technologies have fundamentally transformed the nature of editing and administration in Wikipedia. Specifically, we analyze "vandal fighting" as an epistemic process of distributed cognition, highlighting the role of non-human actors in enabling a decentralized activity of collective intelligence. In all, this case shows that software programs are used for more than enforcing policies and standards. These tools enable coordinated yet decentralized action, independent of the specific norms currently in force. 0 5
Wiki Vandalysis - Wikipedia Vandalism Analysis Manoj Harpalani
Thanadit Phumprao
Megha Bassi
Michael Hart
Rob Johnson
CLEF English 2010 Wikipedia describes itself as the "free encyclopedia that anyone can edit". Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Deterring and reverting vandalism has become one of the

major challenges of Wikipedia as its size grows. Wikipedia editors fight vandalism both manually and with automated bots that use regular expressions and other simple rules to recognize malicious edits. Researchers have also proposed Machine Learning algorithms for vandalism detection, but these algorithms are still in their infancy and have much room for improvement. This paper presents an approach to fighting vandalism by extracting various features from the edits for machine learning classification. Our classifier uses information about the editor, the sentiment of the edit, the "quality" of the edit (i.e. spelling errors), and targeted regular expressions to capture patterns common in blatant

vandalism, such as insertion of obscene words or multiple exclamations. We have successfully been able to achieve an area under the ROC curve (AUC) of 0.91 on a training set of 15000 human annotated edits and 0.887 on a random sample of 17472 edits from 317443.
0 0
Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals Santiago M. Mola Velasco CLEF English 2010 Wikipedia is an online encyclopedia that anyone can edit. In this open model, some people edits with the intent of harming the integrity of Wikipedia. This is known as vandalism. We extend the framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection. In this approach, several vandalism indicating features are extracted from edits in a vandalism corpus and are fed to a supervised learning algorithm. The best performing classifiers were LogitBoost and Random Forest. Our classifier, a Random Forest, obtained an AUC of 0.92236, ranking in the first place of the PAN’10 Wikipedia vandalism detection task. 4 0
Detector y corrector automático de ediciones maliciosas en Wikipedia Emilio J. Rodríguez-Posada Spanish 2009 El proyecto desarrolla AVBOT (acrónimo de Anti-Vandalism BOT), un programa que detecta y corrige automáticamente ediciones maliciosas en Wikipedia en español. Está programado en Python y utiliza las librerías pywikipediabot y python-irclib. 0 0
ClueBot and Vandalism in Wikipedia Jacobi Carter English 2008 0 1
See also: List of anti-vandalism tools.