Reverts Revisited: Accurate Revert Detection in Wikipedia
|Reverts Revisited: Accurate Revert Detection in Wikipedia|
|Author(s)||Fabian Flöck, Denny Vrandecic, Elena Simperl|
|Published in||Hypertext and Social Media 2012|
|Keyword(s)||Wikipedia, revert detection, editing behavior, user modeling, collaboration systems, community-driven content creation, social dynamics (Extra: Revert, Content removal, Accuracy, Behaviour, Collaboration, Data mining, Editorial activity, Content removal)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Reverts Revisited: Accurate Revert Detection in Wikipedia is a 2012 conference paper written in English by Fabian Flöck, Denny Vrandecic, Elena Simperl and published in Hypertext and Social Media 2012.
Wikipedia is commonly used as a proving ground for research in collaborative systems. This is likely due to its popularity and scale, but also to the fact that large amounts of data about its formation and evolution are freely available to inform and validate theories and models of online collaboration. As part of the development of such approaches, revert detection is often performed as an important pre-processing step in tasks as diverse as the extraction of implicit networks of editors, the analysis of edit or editor features and the removal of noise when analyzing the emergence of the con-tent of an article. The current state of the art in revert detection is based on a rather naïve approach, which identifies revision duplicates based on MD5 hash values. This is an efficient, but not very precise technique that forms the basis for the majority of research based on revert relations in Wikipedia. In this paper we prove that this method has a number of important drawbacks - it only detects a limited number of reverts, while simultaneously misclassifying too many edits as reverts, and not distinguishing between complete and partial reverts. This is very likely to hamper the accurate interpretation of the findings of revert-related research. We introduce an improved algorithm for the detection of reverts based on word tokens added or deleted to adresses these drawbacks. We report on the results of a user study and other tests demonstrating the considerable gains in accuracy and coverage by our method, and argue for a positive trade-off, in certain research scenarios, between these improvements and our algorithm’s increased runtime.
This publication has 13 references. Only those references related to wikis are included here:
- "Identifying discarded work in wiki article history" (create it!)
- "Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie" (create it!)
- "Herding the cats: the influence of groups in coordinating peer production" (create it!)
- "Wp:clubhouse? an exploration of wikipedia’s gender imbalance" (create it!)
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.