Calton Pu is an author.


A content-context-centric approach for detecting vandalism in Wikipedia Collaborative online social media
Top-ranked co-occurrence probability
Vandalism detection
WWW co-occurrence probability
Proceedings of the 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, COLLABORATECOM 2013 English 2013 Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content-context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach. 0 0
Elusive vandalism detection in Wikipedia: A text stability-based approach Classification
Vandalism detection
International Conference on Information and Knowledge Management, Proceedings English 2010 The open collaborative nature of wikis encourages participation of all users, but at the same time exposes their content to vandalism. The current vandalism-detection techniques, while effective against relatively obvious vandalism edits, prove to be inadequate in detecting increasingly prevalent sophisticated (or elusive) vandal edits. We identify a number of vandal edits that can take hours, even days, to correct and propose a text stability-based approach for detecting them. Our approach is focused on the likelihood of a certain part of an article being modified by a regular edit. In addition to text-stability, our machine learning-based technique also takes into account edit patterns. We evaluate the performance of our approach on a corpus comprising of 15000 manually labeled edits from the Wikipedia Vandalism PAN corpus. The experimental results show that text-stability is able to improve the performance of the selected machine-learning algorithms significantly. 0 0
Modeling and implementing collaborative editing systems with transactional techniques Proceedings of the 6th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2010 English 2010 Many collaborative editing systems have been developed for coauthoring documents. These systems generally have different infrastructures and support a subset of interactions found in collaborative environments. In this paper, we propose a transactional framework with two advantages. First, the framework is generic as demonstrated by its capability of modeling four types of existing products: RCS, MediaWiki, Google Docs, and Google Wave. Second, the framework can be layered on the top of a modern database management system to reuse its transaction processing capabilities for data consistency control in both centralized and replicated editing systems. We detail the programming interfaces and the synchronization protocol of our transactional framework and demonstrate its usage through concrete examples. We also describe a prototype implementation of this framework over Oracle Berkeley DB High Availability, a replicated transactional database management system. 0 0
Cosmos: A wiki data management system Version control systems
WikiSym English 2009 Wiki applications are becoming increasingly important for knowledge sharing between large numbers of users. To prevent against vandalism and recover from damaging edits, wiki applications need to maintain revision histories of all documents. Due to the large amounts of data and traffic, a Wiki application needs to store the data economically on disk and processes them efficiently. Current wiki data management systems make a trade-off between storage requirement and access time for document update and retrieval. We introduce a new data management system, Cosmos, to balance this trade-off. Copyright 0 0