Spam

From WikiPapers
Jump to: navigation, search

spam is included as keyword or extra keyword in 0 datasets, 0 tools and 8 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Diversionary comments under political blog posts Wang J.
Yu C.T.
Yu P.S.
Ben Liu
Meng W.
ACM International Conference Proceeding Series English 2012 An important issue that has been neglected so far is the identification of diversionary comments. Diversionary comments under political blog posts are defined as comments that deliberately twist the bloggers' intention and divert the topic to another one. The purpose is to distract readers from the original topic and draw attention to a new topic. Given that political blogs have significant impact on the society, we believe it is imperative to identify such comments. We then categorize diversionary comments into 5 types, and propose an effective technique to rank comments in descending order of being diversionary. To the best of our knowledge, the problem of detecting diversionary comments has not been studied so far. Our evaluation on 2,109 comments under 20 different blog posts from Digg.com shows that the proposed method achieves the high mean average precision (MAP) of 92.6%. Sensitivity analysis indicates that the effectiveness of the method is stable under different parameter settings. 0 0
Autonomous Link Spam Detection in Purely Collaborative Environments Andrew G. West
Avantika Agrawal
Phillip Baker
Brittney Exline
Insup Lee
WikiSym English October 2011 Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis.

Recent research has exposed vulnerabilities in Wikipedia's link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriers-to-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination).

In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using "wiki" metadata, landing site analysis, and external data sources. The resulting classifier attains 64% recall at 0.5% false-positives (ROC-AUC=0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed.
0 0
Link Spamming Wikipedia for Profit Andrew G. West
Jian Chang
Krishna Venkatasubramanian
Oleg Sokolsky
Insup Lee
CEAS '11: Proc. of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference English September 2011 Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks.

Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize *exposure*, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement.

Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies.
0 0
Link spamming Wikipedia for profit Andrew G. West
Jian Chang
Krishna Venkatasubramanian
Oleg Sokolsky
Insup Lee
CEAS English 2011 Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks.

Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize *exposure*, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement.

Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies.
0 0
Vandalism detection in Wikipedia: A high-performing, feature-rich model and its reduction through Lasso Sara Javanmardi
David W. McDonald
Lopes C.V.
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 User generated content (UGC) constitutes a significant fraction of the Web. However, some wiiki-based sites, such as Wikipedia, are so popular that they have become a favorite target of spammers and other vandals. In such popular sites, human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity. The application of machine learning techniques holds promise for developing efficient online algorithms for better tools to assist users in vandalism detection. We describe an efficient and accurate classifier that performs vandalism detection in UGC sites. We show the results of our classifier in the PAN Wikipedia dataset. We explore the effectiveness of a combination of 66 individual features that produce an AUC of 0.9553 on a test dataset - the best result to our knowledge. Using Lasso optimization we then reduce our feature - rich model to a much smaller and more efficient model of 28 features that performs almost as well - the drop in AUC being only 0.005. We describe how this approach can be generalized to other user generated content systems and describe several applications of this classifier to help users identify potential vandalism. 0 0
A framework for co-classification of articles and users in Wikipedia LeBo Liu
Tan P.-N.
Proceedings - 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010 English 2010 The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors. 0 0
Expanding communication mechanisms: They're not just E-mailing anymore Murnan C.A. Proceedings of the 34th Annual ACM SIGUCCS Fall 2006 Conference, SIGUCCS '06 English 2006 Students are walking around with cell phones, making calls and text-messaging. For many, this has now become their main communication mechanism with friends and family. College faculty and staff still count on e-mail as the main communication tool, amongst themselves and with students. Student demand for email accounts from new students before they even arrive on campus has increased exponentially in the past couple of years. Web pages are used to provide information to the outside community and internally, across campus. Web pages have often become the main mechanism for providing step-by-step documentation. Meanwhile, wikis, blogs and MySpace® have entered the online communication world. Students look at our web pages, but how often? They all have college-provided e-mail accounts, but do they use them? What is the best mechanism these days to get the word out, and what will be the mechanism in the future? This paper will explore the mechanisms and approaches that students, and others on campus, are using to communicate now, and will present thoughts on where we're going in the future and the impact that will have on user services. Copyright 2005 ACM. 0 0
Expanding communication mechanisms: they're not just e-mailing anymore Murnan C.A. Proceedings ACM SIGUCCS User Services Conference English 2006 Students are walking around with cell phones, making calls and text-messaging. For many, this has now become their main communication mechanism with friends and family. College faculty and staff still count on e-mail as the main communication tool, amongst themselves and with students. Student demand for e-mail accounts from new students before they even arrive on campus has increased exponentially in the past couple of years. Web pages are used to provide information to the outside community and internally, across campus. Web pages have often become the main mechanism for providing step-by-step documentation. Meanwhile, wikis, blogs and MySpace have entered the online communication world. Students look at our web pages, but how often? They all have college-provided e-mail accounts, but do they use them? What is the best mechanism these days to get the word out, and what will be the mechanism in the future? This paper will explore the mechanisms and approaches that students, and others on campus, are using to communicate now, and will present thoughts on where we're going in the future and the impact that will have on user services. Copyright 2006 ACM. 0 0