Anwitaman Datta

From WikiPapers
Jump to: navigation, search

Anwitaman Datta is an author.


Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A preliminary study on the effects of barnstars on wikipedia editing Barnstars
Editing behaviour
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 This paper presents a preliminary study into the awarding of barnstars among Wikipedia editors to better understand their motivations in contributing to Wikipedia articles. We crawled the talk pages of all active Wikipedia editors and retrieved 21,299 barnstars that were awarded among 14,074 editors. In particular, we found that editors do not award and receive barnstars in equal (or similar) quantities. Also, editors were more active in editing articles before awarding or receiving barnstars. Categories and Subject Descriptors H.5.3 [Group and Organization Interfaces]: Computer- supported cooperative work General Terms Measurement. Copyright 2010 ACM. 0 0
Interest classification of twitter users using wikipedia Social network
User interest
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 We present a framework for (automatically) classifying the relative interests of Twitter users using information from Wikipedia. Our proposed framework first usesWikipedia to automatically classify a user's celebrity followings into various interest categories, followed by determining the relative interests of the user with a weighting compared to his/her other interests. Our preliminary evaluation on Twitter shows that this framework is able to correctly classify users' interests and that these users frequently converse about topics that reflect both their (detected) interest and a related real-life event. Categories and Subject Descriptors: J.4 [Computer Applications]: Social and behavioral sciences General Terms: Theory. Copyright 2010 ACM. 0 0
Twevent: Segment-based event detection from tweets Event Detection
Tweet segmentation
ACM International Conference Proceeding Series English 2012 Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, the unique characteristics of tweets (e.g. short and noisy content, diverse and fast changing topics, and large data volume) make event detection a challenging task. Most existing techniques proposed for well written documents (e.g. news articles) cannot be directly adopted. In this paper, we propose a segment-based event detection system for tweets, called Twevent. Twevent first detects bursty tweet segments as event segments and then clusters the event segments into events considering both their frequency distribution and content similarity. More specifically, each tweet is split into non-overlapping segments (i.e. phrases possibly refer to named entities or semantically meaningful information units). The bursty segments are identified within a fixed time window based on their frequency patterns, and each bursty segment is described by the set of tweets containing the segment published within that time window. The similarity between a pair of bursty segments is computed using their associated tweets. After clustering bursty segments into candidate events, Wikipedia is exploited to identify the realistic events and to derive the most newsworthy segments to describe the identified events. We evaluate Twevent and compare it with the state-of-the-art method using 4.3 million tweets published by Singapore-based users in June 2010. In our experiments, Twevent outperforms the state-of-the-art method by a large margin in terms of both precision and recall. More importantly, the events detected by Twevent can be easily interpreted with little background knowledge because of the newsworthy segments. We also show that Twevent is efficient and scalable, leading to a desirable solution for event detection from tweets. 0 0
TwiNER: Named entity recognition in targeted twitter stream Named entity recognition
Web n-gram
SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval English 2012 Many private and/or public organizations have been reported to create and monitor targeted Twitter streams to collect and understand users' opinions about the organizations. Targeted Twitter stream is usually constructed by filtering tweets with user-defined selection criteria e.g. tweets published by users from a selected region, or tweets that match one or more predefined keywords. Targeted Twitter stream is then monitored to collect and understand users' opinions about the organizations. There is an emerging need for early crisis detection and response with such target stream. Such applications require a good named entity recognition (NER) system for Twitter, which is able to automatically discover emerging named entities that is potentially linked to the crisis. In this paper, we present a novel 2-step unsupervised NER system for targeted Twitter stream, called TwiNER. In the first step, it leverages on the global context obtained from Wikipedia and Web N-Gram corpus to partition tweets into valid segments (phrases) using a dynamic programming algorithm. Each such tweet segment is a candidate named entity. It is observed that the named entities in the targeted stream usually exhibit a gregarious property, due to the way the targeted stream is constructed. In the second step, TwiNER constructs a random walk model to exploit the gregarious property in the local context derived from the Twitter stream. The highly-ranked segments have a higher chance of being true named entities. We evaluated TwiNER on two sets of real-life tweets simulating two targeted streams. Evaluated using labeled ground truth, TwiNER achieves comparable performance as with conventional approaches in both streams. Various settings of TwiNER have also been examined to verify our global context + local context combo idea. 0 0
A generalized method for word sense disambiguation based on wikipedia Context pruning
Word sense disambiguation
ECIR English 2011 0 0
COBS: Realizing decentralized infrastructure for collaborative browsing and search Proceedings - International Conference on Advanced Information Networking and Applications, AINA English 2011 Finding relevant and reliable information on the web is a non-trivial task. While internet search engines do find correct web pages with respect to a set of keywords, they often cannot ensure the relevance or reliability of their content. An emerging trend is to harness internet users in the spirit of Web 2.0, to discern and personalize relevant and reliable information. Users collaboratively search or browse for information, either directly by communicating or indirectly by adding meta information (e.g., tags) to web pages. While gaining much popularity, such approaches are bound to specific service providers, or the Web 2.0 sites providing the necessary features, and the knowledge so generated is also confined to, and subject to the whims and censorship of such providers. To overcome these limitations we introduce COBS, a browser-centric knowledge repository which enjoys the inherent openness (similar to WIKIPEDIA) while aiming to provide end-users the freedom of personalization and privacy by adopting an eventually hybrid/p2p back-end. In this paper we first present the COBS front-end, a browser add-on that enables users to tag, rate or comment arbitrary web pages and to socialize with others in both a synchronous and asynchronous manner. We then discuss how a decentralized back-end can be realized. While Distributed Hash Tables (DHTs) are the most natural choice, and despite a decade of research on DHT designs, we encounter several, some small, while others more fundamental shortcomings that need to be surmounted in order to realize an efficient, scalable and reliable decentralized back-end for COBS. To that end, we outline various design alternatives and discuss qualitatively (and quantitatively, when possible) their (dis-)advantages. We believe that the objectives of COBS are ambitious, posing significant challenges for distributed systems, middleware and distributed data-analytics research, even while building on the existing momentum. Based on experiences from our ongoing work on COBS, we outline these systems research issues in this position paper. 0 0
Semantic tag recommendation using concept model Concept model
Semantic tag
Tag recommendation
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2011 The common tags given by multiple users to a particular document are often semantically relevant to the document and each tag represents a specific topic. In this paper, we attempt to emulate human tagging behavior to recommend tags by considering the concepts contained in documents. Specifically, we represent each document using a few most relevant concepts contained in the document, where the concept space is derived from Wikipedia. Tags are then recommended based on the tag concept model derived from the annotated documents of each tag. Evaluated on a Delicious dataset of more than 53K documents, the proposed technique achieved comparable tag recommendation accuracy as the state-of-the-art, while yielding an order of magnitude speed-up. 0 0
WikiTeams: How do they achieve success? IEEE Potentials English 2011 Web 2.0 technology and so-called social media are among the most popular (among users and researchers alike) Internet technologies today. Among them, Wiki technology - created to simplify HTML editing and enable open, collaborative editing of pages by ordinary Web users - occupies an important place. Wiki is increasingly adopted by businesses as a useful form of knowledge management and sharing, creating "corporate Wikis." However, the most widely known application of Wiki technology - Wikipedia - is, according to many analysts, more than just an open encyclopedia that uses Wiki. 0 0
Do wikipedians follow domain experts?: A domain-specific study on wikipedia knowledge building Contributing behavior
Knowledge building
Proceedings of the ACM International Conference on Digital Libraries English 2010 Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT. We identified 409 Wikipedia articles matching TKB records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB - despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. 0 1
Visualizing and exploring evolving information networks in Wikipedia ICADL English 2010 0 0
SSnetViz: A visualization engine for heterogeneous semantic social networks Semantic social network
Social network exploration
ACM International Conference Proceeding Series English 2009 SSnetViz is an ongoing research to design and implement a visualization engine for heterogeneous semantic social networks. A semantic social network is a multi-modal network that contains nodes representing different types of people or object entities, and edges representing relationships among them. When multiple heterogeneous semantic social networks are to be visualized together, SSnetViz provides a suite of functions to store heterogeneous semantic social networks, to integrate them for searching and analysis. We will illustrate these functions using social networks related to terrorism research, one crafted by domain experts and another from Wikipedia. Copyright 0 0
On visualizing heterogeneous semantic networks from multiple data sources Lecture Notes in Computer Science English 2008 In this paper, we focus on the visualization of heterogeneous semantic networks obtained from multiple data sources. A semantic network comprising a set of entities and relationships is often used for representing knowledge derived from textual data or database records. Although the semantic networks created for the same domain at different data sources may cover a similar set of entities, these networks could also be very different because of naming conventions, coverage, view points, and other reasons. Since digital libraries often contain data from multiple sources, we propose a visualization tool to integrate and analyze the differences among multiple social networks. Through a case study on two terrorism-related semantic networks derived from Wikipedia and Terrorism Knowledge Base (TKB) respectively, the effectiveness of our proposed visualization tool is demonstrated. 0 0
WikiNetViz: Visualizing friends and adversaries in implicit social networks Controversy
Visual analytics
IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008 English 2008 When multiple users with diverse backgrounds and beliefs edit Wikipedia together, disputes often arise due to disagreements among the users. In this paper, we introduce a novel visualization tool known as WikiNetViz to visualize and analyze disputes among users in a dispute-induced social network. WikiNetViz is designed to quantify the degree of dispute between a pair of users using the article history. Each user (and article) is also assigned a controversy score by our proposed ControversyRank model so as to measure the degree of controversy of a user (and an article) by the amount of disputes between the user (article) and other users in articles of varying degrees of controversy. On the constructed social network, WikiNetViz can perform clustering so as to visualize the dynamics of disputes at the user group level. It also provides an article viewer for examining an article revision so as to determine the article content modified by different users. 0 0