Sara Javanmardi

From WikiPapers
Jump to: navigation, search

Sara Javanmardi is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Distributed tuning of machine learning algorithms using MapReduce Clusters Hyper-parameter
Machine learning
MapReduce
Optimization
Tuning
Proceedings of the 3rd Workshop on Large Scale Data Mining: Theory and Applications, LDMTA 2011 - Held in Conjunction with ACM SIGKDD 2011 English 2011 Obtaining the best accuracy in machine learning usually requires carefully tuning learning algorithm parameters for each problem. Parameter optimization is computationally challenging for learning methods with many hyperparameters. In this paper we show that MapReduce Clusters are particularly well suited for parallel parameter optimization. We use MapReduce to optimize regularization parameters for boosted trees and random forests on several text problems: three retrieval ranking problems and a Wikipedia vandalism problem. We show how model accuracy improves as a function of the percent of parameter space explored, that accuracy can be hurt by exploring parameter space too aggressively, and that there can be significant interaction between parameters that appear to be independent. Our results suggest that MapReduce is a two-edged sword: it makes parameter optimization feasible on a massive scale that would have been unimaginable just a few years ago, but also creates a new opportunity for overfitting that can reduce accuracy and lead to inferior learning parameters. 0 0
Finding patterns in behavioral observations by automatically labeling forms of wikiwork in Barnstars Behavioral patterns
Multi-label learning
Wikipedia
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 Our everyday observations about the behaviors of others around us shape how we decide to act or interact. In social media the ability to observe and interpret others' behavior is limited. This work describes one approach to leverage everyday behavioral observations to develop tools that could improve understanding and sense making capabilities of contributors, managers and researchers of social media systems. One example of behavioral observation is Wikipedia Barnstars. Barnstars are a type of award recognizing the activities of Wikipedia editors. We mine the entire English Wikipedia to extract barnstar observations. We develop a multi-label classifier based on a random forest technique to recognize and label distinct forms of observed and acknowledged activity. We evaluate the classifier through several means including use of separate training and testing datasets and the by application of the classifier to previously unlabeled data. We use the classifier to identify Wikipedia editors who have been observed with some predominant types of behavior and explore whether those patterns of behavior are evident and how observers seem to be making the observations. We discuss how these types of activity observations can be used to develop tools and potentially improve understanding and analysis in wikis and other online communities. 0 1
Multi-label classification of short text: A study on Wikipedia barnstars AAAI Workshop - Technical Report English 2011 A content analysis of Wikipedia barnstars personalized tokens of appreciation given to participants reveals a wide range of valued work extending beyond simple editing to include social support, administrative actions, and types of articulation work. Barnstars are examples of short semi-structured text characterized by informal grammar and language. We propose a method to classify these barnstars which contain items of interest into various work type categories. We evaluate several multi-label text categorization classifiers and show that significant performance can be achieved by simple classifiers using features which carry context extracted from barnstars. Although this study focused specifically on work categorization via barnstar content for Wikipedia, we believe that the findings are applicable to other similar collaborative systems. Copyright © 2011, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Vandalism detection in Wikipedia: A high-performing, feature-rich model and its reduction through Lasso Lasso
Random forests
Vandalism detection
Wikipedia
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 User generated content (UGC) constitutes a significant fraction of the Web. However, some wiiki-based sites, such as Wikipedia, are so popular that they have become a favorite target of spammers and other vandals. In such popular sites, human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity. The application of machine learning techniques holds promise for developing efficient online algorithms for better tools to assist users in vandalism detection. We describe an efficient and accurate classifier that performs vandalism detection in UGC sites. We show the results of our classifier in the PAN Wikipedia dataset. We explore the effectiveness of a combination of 66 individual features that produce an AUC of 0.9553 on a test dataset - the best result to our knowledge. Using Lasso optimization we then reduce our feature - rich model to a much smaller and more efficient model of 28 features that performs almost as well - the drop in AUC being only 0.005. We describe how this approach can be generalized to other user generated content systems and describe several applications of this classifier to help users identify potential vandalism. 0 0
Vandalism detection in Wikipedia: a high-performing, feature-rich model and its reduction through Lasso Lasso
Wikipedia
Random forests
Vandalism detection
WikiSym English 2011 0 0
Modeling user reputation in wikis Web 2.0
Wiki
Wiki mining
Wikipedia
Reliability
Reputation
Statistical Analysis and Data Mining English 2010 Collaborative systems available on the Web allow millions of users to share information through a growing collection of tools and platforms such as wikis, blogs, and shared forums. By their very nature, these systems contain resources and information with different quality levels. The open nature of these systems, however, makes it difficult for users to determine the quality of the available information and the reputation of its providers. Here, we first parse and mine the entire English Wikipedia history pages in order to extract detailed user edit patterns and statistics. We then use these patterns and statistics to derive three computational models of a user's reputation. Finally, we validate these models using ground-truth Wikipedia data associated with vandals and administrators. When used as a classifier, the best model produces an area under the receiver operating characteristic {(ROC)} curve {(AUC)} of 0.98. Furthermore, we assess the reputation predictions generated by the models on other users, and show that all three models can be used efficiently for predicting user behavior in Wikipedia. 0 2
Statistical measure of quality in Wikipedia Wikipedia
Collaborative authoring
Crowdsourcing
Groupware
Web 2.0
Wiki
SOMA English 2010 0 2
CalSWIM: A wiki-based data sharing platform Data-sharing
Knowledge management
Wiki
Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering English 2009 Organizations increasingly create massive internal digital data repositories and are looking for technical advances in managing, exchanging and integrating explicit knowledge. While most of the enabling technologies for knowledge management have been used around for several years, the ability to cost effective data sharing, integration and analysis into a cohesive infrastructure evaded organizations until the advent of Web 2.0 applications. In this paper, we discuss our investigations into using a Wiki as a web-based interactive knowledge management system, which is integrated with some features for easy data access, data integration and analysis. Using the enhanced wiki, it possible to make organizational knowledge sustainable, expandable, outreaching and continually up-to-date. The wiki is currently under use as California Sustainable Watershed Information Manager. We evaluate our work according to the requirements of knowledge management systems. The result shows that our solution satisfies more requirements compared to other tools. 0 0
Leveraging crowdsourcing heuristics to improve search in Wikipedia WikiSym English 2009 0 0
Review-Based Ranking of Wikipedia Articles Wikipedia
Search
Ranking
CASON English 2009 0 0
Review-based ranking of Wikipedia articles Ranking
Search
Wikipedia
CASoN 2009 - International Conference on Computational Aspects of Social Networks English 2009 Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on "Wisdom of Crowd" and the effectiveness of the knowledge collected by a large number of people, we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processingWikipedia's history pages.We compare different ranking algorithms that explore combinations of text- relevancy, PageRank, and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based rankings. 0 0
User contribution and trust in Wikipedia English 2009 Wikipedia, one of the top ten most visited websites, is commonly viewed as the largest online reference for encyclopedic knowledge. Because of its open editing model -allowing anyone to enter and edit content- Wikipedia's overall quality has often been questioned as a source of reliable information. Lack of study of the open editing model of Wikipedia and its effectiveness has resulted in a new generation of wikis that restrict contributions to registered users only, using their real names. In this paper, we present an empirical study of user contributions to Wikipedia. We statistically analyze contributions by both anonymous and registered users. The results show that submissions of anonymous and registered users in Wikipedia suggest a power law behavior. About 80% of the revisions are submitted by less than 7% of the users, most of whom are registered users. To further refine the analyzes, we use the Wiki Trust Model (WTM), a user reputation model developed in our previous work to assign a reputation value to each user. As expected, the results show that registered users contribute higher quality content and therefore are assigned higher reputation values. However, a significant number of anonymous users also contribute high-quality content.We provide further evidence that regardless of a user s' attribution, registered or anonymous, high reputation users are the dominant contributors that actively edit Wikipedia articles in order to remove vandalism or poor quality content. 0 2