Language of vandalism: Improving Wikipedia vandalism detection via stylometric analysis
|Language of vandalism: Improving Wikipedia vandalism detection via stylometric analysis|
|Author(s)||Harpalani M., Hart M., Singh S., Johnson R., Choi Y.|
|Published in||ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies|
|Keyword(s)||Unknown (Extra: Collective intelligences, Lexico-syntactic patterns, N-grams, Probabilistic context free grammars, Statistical models, Syntactic patterns, Wikipedia, Computational linguistics, Context free grammars, Metadata, Syntactics, Websites, Behavioral research)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Language of vandalism: Improving Wikipedia vandalism detection via stylometric analysis is a 2011 conference paper written in English by Harpalani M., Hart M., Singh S., Johnson R., Choi Y. and published in ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.
Community-based knowledge forums, such as Wikipedia, are susceptible to vandalism, i.e., ill-intentioned contributions that are detrimental to the quality of collective intelligence. Most previous work to date relies on shallow lexico-syntactic patterns and metadata to automatically detect vandalism in Wikipedia. In this paper, we explore more linguistically motivated approaches to vandalism detection. In particular, we hypothesize that textual vandalism constitutes a unique genre where a group of people share a similar linguistic behavior. Experimental results suggest that (1) statistical models give evidence to unique language styles in vandalism, and that (2) deep syntactic patterns based on probabilistic context free grammars (PCFG) discriminate vandalism more effectively than shallow lexicosyntactic patterns based on n-grams.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 3 time(s)