Wikipedia

From WikiPapers
Jump to: navigation, search

wikipedia is included as keyword or extra keyword in 2 datasets, 2 tools and 358 publications.

Datasets

Dataset Size Language Description
EPIC/Oxford Wikipedia quality assessment English EPIC/Oxford Wikipedia quality assessment This dataset comprises the full, anonymized set of responses from the blind assessment of a sample of Wikipedia articles across languages and disciplines by academic experts. The study was conducted in 2012 by EPIC and the University of Oxford and sponsored by the Wikimedia Foundation.
Wikipedia search data Multilingual Wikipedia search data are logs about search queries by visitors.

Tools

Tool Operating System(s) Language(s) Programming language(s) License Description Image
Wikipedia Recent Changes Map Web English JavaScript Wikipedia Recent Changes Map is a web tool that displays a world map showing anonymous edits to Wikipedia, geolocated by IP.
WikipediaVision Web English WikipediaVision is a web-based tool that shows anonymous edits to Wikipedia (almost) in real-time.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Art History on Wikipedia, a Macroscopic Observation Doron Goldfarb
Max Arends
Josef Froschauer
Dieter Merkl
ArXiv English 20 April 2013 How are articles about art historical actors interlinked within Wikipedia? Lead by this question, we seek an overview on the link structure of a domain specific subset of Wikipedia articles. We use an established domain-specific person name authority, the Getty Union List of Artist Names (ULAN), in order to externally identify relevant actors. Besides containing consistent biographical person data, this database also provides associative relationships between its person records, serving as a reference link structure for comparison. As a first step, we use mappings between the ULAN and English Dbpedia provided by the Virtual Internet Authority File (VIAF). This way, we are able to identify 18,002 relevant person articles. Examining the link structure between these resources reveals interesting insight about the high level structure of art historical knowledge as it is represented on Wikipedia. 0 1
Estado del arte de la investigación sobre wikis Emilio J. Rodríguez-Posada
Juan Manuel Dodero-Beardo
University of Cádiz Spanish December 2012 El interés de los investigadores por los wikis, en especial Wikipedia, ha ido en aumento en los últimos años. La primera edición de WikiSym, un simposio sobre wikis, se celebró en 2005 y desde entonces han aparecido multitud de congresos, workshops, conferencias y competiciones en este área. El estudio de los wikis es un campo emergente y prolífico.

Ha habido varios intentos, aunque con escaso éxito, de recopilar toda la literatura sobre wikis. Unas veces el enfoque o la herramienta utilizada eran limitados, otras debido a las dimensiones de la tarea el proyecto era abandonado y al poco tiempo los metadatos bibliográficos se perdían. En este trabajo presentamos WikiPapers, un proyecto colaborativo para recopilar toda la literatura sobre wikis. Se hace uso de MediaWiki y su extensión semántica, ambos conocidos por los investigadores de este campo. Hasta noviembre de 2012 se han recopilado más de 1.700 publicaciones y sus metadatos, además de documentación sobre herramientas y datasets relacionados. Los metadatos son exportables en los formatos BibTeX, RDF, CSV y JSON. Los historiales completos del wiki están disponibles para descargar y facilitar su preservación. El proyecto está abierto a la participación de todo el mundo.

El resto del trabajo se divide de la siguiente manera. En la sección 2 motivamos este trabajo haciendo un repaso a los distintos enfoques utilizados hasta ahora para recopilar toda la literatura sobre wikis, incidiendo en sus ventajas e inconvenientes. En la sección 3 detallamos los objetivos. En la sección 4 definimos algunos términos que servirán para comprender mejor el contenido. En la sección 5 presentamos WikiPapers, cómo funciona y qué pasos se han dado. En la sección 6 hacemos un estado del arte empleando WikiPapers. En la sección 7 repasamos las cuestiones que a día de hoy siguen abiertas o que han tenido poca atención hasta ahora. Finalmente, en la sección 8, terminamos con unas conclusiones y trabajo futuro.
0 0
Mass Collaboration or Mass Amateurism? A comparative study on the quality of scientific information produced using Wiki tools and concepts Fernando Rodrigues Universidade Évora Portuguese December 2012 With this PhD dissertation, we intend to contribute to a better understanding of the Wiki phenomenon as a knowledge management system which aggregates private knowledge. We also wish to check to what extent information generated through anonymous and freely bestowed mass collaboration is reliable as opposed to the traditional approach.

In order to achieve that goal, we develop a comparative study between Wikipedia and Encyclopaedia Britannica with regard to accuracy, depth and detail of information in both, in order to confront the quality of the knowledge repository produced by them. That will allow us to reach a conclusion about the efficacy of the business models behind them.

We will use a representative random sample which is composed by the articles that are comprised in both encyclopedias. Each pair of articles was previously reformatted and then graded by an expert in its subject area. At the same time, we collected a small convenience sample which only integrates Management articles. Each pair of articles was graded by several experts in order to determine the uncertainty associated with having diverse gradings of the same article and apply it to the evaluations carried out by just one expert. The conclusion was that the average quality of the Wikipedia articles which were analysed was superior to its peers’ and that this difference was statistically significant.

An inquiry was conducted within the academia which certified that traditional information sources were used by a minority as the first approach to seeking information. This inquiry also made clear that reliance on these sources was considerably larger than reliance on information obtained through Wikipedia. This quality perception, as well as the diametrically opposed results of its evaluation through a blind test, reinforces the evaluating panel’s exemption.

However much the chosen sample is representative of the universe to be studied, results have depended on the evaluators’ personal opinion and chosen criteria. This means that the reproducibility of this study’s conclusions using a different grading panel cannot be guaranteed. Nevertheless, this is not enough of a reason to reject the study results obtained through more than five hundred evaluations.

This thesis is thus an attempt to help clarifying this topic and contributing to a better perception of the quality of a tool which is daily used by millions of people, of the mass collaboration which feeds it and of the collaborative software that supports it.
0 0
Wikipédia, espace fluide, espace à parcourir Rémi Mathis La Revue de la BNU French September 2012 Wikipédia est un espace foncièrement décentré : qui existe en plus de 280 langues, où les auteurs se comptent en centaines de milliers, qui évolue sans cesse pour coller au dernier état du savoir. Afin de faciliter la navigation, des portes d'entrée sont créées et des outils permettent de structurer cet espace. L'idée n'est toutefois pas d'imposer un parcours mais bien au contraire de favoriser la fluidité de la lecture, par des itinéraires sans cesse réinventés par les lecteurs - tendant à enrichir son expérience de découverte et l'amener vers des articles qu'ils n'aurait pas cherché par lui-même. 0 0
Assessing the accuracy and quality of Wikipedia entries compared to popular online encyclopaedias Imogen Casebourne
Chris Davies
Michelle Fernandes
Naomi Norman
English 2 August 2012 8 0
Citation needed: The dynamics of referencing in Wikipedia Chih-Chun Chen
Camille Roth
WikiSym August 2012 The extent to which a Wikipedia article refers to external sources to substantiate its content can be seen as a measure of its externally invoked authority. We introduce a protocol for characterising the referencing process in the context of general article editing. With a sample of relatively mature articles, we show that referencing does not occur regularly through an article’s lifetime but is associated with periods of more substantial editing, when the article has reached a certain level of maturity (in terms of the number of times it has been revised and its length). References also tend to be contributed by editors who have contributed more frequently and more substantially to an article, suggesting that a subset of more qualified or committed editors may exist for each article. 0 0
Deletion Discussions in Wikipedia: Decision Factors and Outcomes Jodi Schneider
Alexander Passant
Stefan Decker
WikiSym English August 2012 Deletion of articles is a common process in Wikipedia, in order to ensure the overall quality of the encyclopedia. Yet, there is a need to better understand the procedures in order to promote the best decisions without unnecessary community work. In this paper, we study deletion in Wikipedia, drawing from factor analysis, and taking an in-depth, content-analysis-based approach. We address three research questions: First, what factors contribute to the decision about whether to delete a given article? Second, when multiple factors are given, what is the relative importance of those factors? Third, what are the outcomes of deletion discussions, both for articles and for the community? We find that multiple factors contribute to the assessment of an article, and we discuss their relative frequency. Further, we show how the assessment timeline focuses attention on improving borderline articles that have the potential to meet Wikipedia’s content inclusion policies, and we highlight the role of novice contributors in this improvement process. 0 0
Drawing a Data-Driven Portrait of Wikipedia Editors Robert West
Ingmar Weber
Carlos Castillo
WikiSym English August 2012 While there has been a substantial amount of research into the editorial and organizational processes within Wikipedia, little is known about how Wikipedia editors (Wikipedians) relate to the online world in general. We attempt to shed light on this issue by using aggregated log data from Yahoo!’s browser toolbar in order to analyze Wikipedians’ editing behavior in the context of their online lives beyond Wikipedia. We broadly characterize editors by investigating how their online behavior differs from that of other users; e.g., we find that Wikipedia editors search more, read more news, play more games, and, perhaps surprisingly, are more immersed in popular culture. Then we inspect how editors’ general interests relate to the articles to which they contribute; e.g., we confirm the intuition that editors are more familiar with their active domains than average users. Finally, we analyze the data from a temporal perspective; e.g., we demonstrate that a user’s interest in the edited topic peaks immediately before the edit. Our results are relevant as they illuminate novel aspects of what has become many Web users’ prevalent source of information. 0 0
Etiquette in Wikipedia: Weening New Editors into Productive Ones Ryan Faulkner
Steven Walling
Maryana Pinchuk
WikiSym English August 2012 Currently, the greatest challenge faced by the Wikipedia community involves reversing the decline of active editors on the site – in other words, ensuring that the encyclopedia’s contributors remain sufficiently numerous to fill the roles that keep it relevant. Due to the natural drop-off of old contributors, newcomers must constantly be socialized, trained and retained. However recent research has shown the Wikipedia community is failing to retain a large proportion of productive new contributors and implicates Wikipedia’s semi-automated quality control mechanisms and their interactions with these newcomers as an exacerbating factor. This paper evaluates the effectiveness of minor changes to the normative warning messages sent to newcomers from one of the most prolific of these quality control tools (Huggle) in preserving their rate of contribution. The experimental results suggest that substantial gains in newcomer participation can be attained through inexpensive changes to the wording of the first normative message that new contributors receive. 0 0
Identifying controversial articles in Wikipedia: A comparative study Hoda Sepehri Rad
Denilson Barbosa
WikiSym English August 2012 Wikipedia articles are the result of the collaborative editing of a diverse group of anonymous volunteer editors, who are passionate and knowledgeable about specific topics. One can argue that this plurality of perspectives leads to broader coverage of the topic, thus benefitting the reader. On the other hand, differences among editors on polarizing topics can lead to controversial or questionable content, where facts and arguments are presented and discussed to support a particular point of view. Controversial articles are manually tagged by Wikipedia editors, and span many interesting and popular topics, such as religion, history, and politics, to name a few. Recent works have been proposed on automatically identifying controversy within unmarked articles. However, to date, no systematic comparison of these efforts has been made. This is in part because the various methods are evaluated using different criteria and on different sets of articles by different authors, making it hard for anyone to verify the efficacy and compare all alternatives. We provide a first attempt at bridging this gap. We compare five different methods for modelling and identifying controversy, and discuss some of the unique difficulties and opportunities inherent to the way Wikipedia is produced. 0 0
In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-Language Link Network Morten Warncke-Wang
Anuradha Uduwage
Zhenhua Dong
John Riedl
WikiSym English August 2012 Wikipedia has become one of the primary encyclopaedic information repositories on the World Wide Web. It started in 2001 with a single edition in the English language and has since expanded to more than 20 million articles in 283 languages. Criss-crossing between the Wikipedias is an interlanguage link network, connecting the articles of one edition of Wikipedia to another. We describe characteristics of articles covered by nearly all Wikipedias and those covered by only a single language edition, we use the network to understand how we can judge the similarity between Wikipedias based on concept coverage, and we investigate the flow of translation between a selection of the larger Wikipedias. Our findings indicate that the relationships between Wikipedia editions follow Tobler's first law of geography: similarity decreases with increasing distance. The number of articles in a Wikipedia edition is found to be the strongest predictor of similarity, while language similarity also appears to have an influence. The English Wikipedia edition is by far the primary source of translations. We discuss the impact of these results for Wikipedia as well as user-generated content communities in general. 0 0
Manypedia: Comparing Language Points of View of Wikipedia Communities Paolo Massa
Federico Scrinzi
WikiSym English August 2012 The 4 million articles of the English Wikipedia have been written in a collaborative fashion by more than 16 million volunteer editors. On each article, the community of editors strive to reach a neutral point of view, representing all significant views fairly, proportionately, and without biases. However, beside the English one, there are more than 280 editions of Wikipedia in different languages and their relatively isolated communities of editors are not forced by the platform to discuss and negotiate their points of view. So the empirical question is: do communities on different language Wikipedias develop their own diverse Linguistic Points of View (LPOV)? To answer this question we created and released as open source Manypedia, a web tool whose aim is to facilitate cross-cultural analysis of Wikipedia language communities by providing an easy way to compare automatically translated versions of their different representations of the same topic. 0 0
Mutual Evaluation of Editors and Texts for Assessing Quality of Wikipedia Articles Yu Suzuki
Masatoshi Yoshikawa
WikiSym English August 2012 In this paper, we propose a method to identify good quality Wikipedia articles by mutually evaluating editors and texts. A major approach for assessing article quality is a text survival ratio based approach. In this approach, when a text survives beyond multiple edits, the text is assessed as good quality. This approach assumes that poor quality texts are deleted by editors with high possibility. However, many vandals delete good quality texts frequently, then the survival ratios of good quality texts are improperly decreased by vandals. As a result, many good quality texts are unfairly assessed as poor quality. In our method, we consider editor quality for calculating text quality, and decrease the impacts on text qualities by the vandals who has low quality. Using this improvement, the accuracy of the text quality should be improved. However, an inherent problem of this idea is that the editor qualities are calculated by the text qualities. To solve this problem, we mutually calculate the editor and text qualities until they converge. We did our experimental evaluation, and we confirmed that the proposed method could accurately assess the text qualities. 0 0
Staying in the Loop: Structure and Dynamics of Wikipedia's Breaking News Collaborations Brian Keegan
Darren Gergle
Noshir Contractor
WikiSym English August 2012 Despite the fact that Wikipedia articles about current events are more popular and attract more contributions than typical articles, canonical studies of Wikipedia have only analyzed articles about pre-existing information. We expect the co-authoring of articles about breaking news incidents to exhibit high-tempo coordination dynamics which are not found in articles about historical events and information. Using 1.03 million revisions made by 158,384 users to 3,233 English Wikipedia articles about disasters, catastrophes, and conflicts since 1990, we construct “article trajectories” of editor interactions as they coauthor an article. Examining a subset of this corpus, our analysis demonstrates that articles about current events exhibit structures and dynamics distinct from those observed among articles about non-breaking events. These findings have implications for how collective intelligence systems can be leveraged to process and make sense of complex information. 0 0
Wikipédia. Une somme originale de copies Rémi Mathis Médium French August 2012 Comment Wikipédia peut être le reflet du savoir d'une époque en rejetant la copie. La question de la copie vis-à-vis de Wikipédia est abordée à trois niveaux : 1/Wikipédia est une synthèse de la connaissance mais sa licence l'oblige à être foncièrement originale 2/Wikipédia comme copie des encyclopédies ou nouveau modèle 3/Wikipédia, source de textes prêts à être recopiés 0 0
Writing up rather than writing down: Becoming Wikipedia Literate Heather Ford
R. Stuart Geiger
WikiSym English August 2012 Editing Wikipedia is certainly not as simple as learning the MediaWiki syntax and knowing where the “edit” bar is, but how do we conceptualize the cultural and organizational understandings that make an effective contributor? We draw on work of literacy practitioner and theorist Richard Darville to advocate a multi-faceted theory of literacy that sheds light on what new knowledges and organizational forms are required to improve participation in Wikipedia’s communities. We outline what Darville refers to as the “background knowledges” required to be an empowered, literate member and apply this to the Wikipedia community. Using a series of examples drawn from interviews with new editors and qualitative studies of controversies in Wikipedia, we identify and outline several different literacy asymmetries. 0 0
Wikipédia, un projet hors normes ? Rémi Bachelet
Alexandre Moatti
Responsabilité & Environnement (Annales des Mines) French 24 July 2012 Wikipédia et l'ISO représentent toutes deux une cristallisation du savoir. que ce soit savoir-faire (ISO) ou savoir encyclopédique (Wikipédia). Toutes deux sont fondés sur la recherche de consensus et la collaboration sous forme de textes écrits. Dès le départ Wikipédia a adopté des règles, avec ses cinq principes fondateurs. La montée en puissance a conduit au développement d'un espace méta (ex. page de discussion) dont le fonctionnement a nécessité une codification. 2 0
Wikipedia de la A a la W Tomás Saorín-Pérez Editorial UOC Spanish July 2012 Wikipedia es una realidad que funciona, aunque en teoría pueda parecer un sueño irrealizable. Un puñado de entusiastas ha redefinido desde la nada el concepto clásico de enciclopedia y ha construido la fuente de referencia más usada de la historia. ¿Tiene suficiente calidad? La respuesta es afirmativa, y para justificarlo hay que profundizar en los mecanismos de los que está dotada, que le permiten alcanzar el nivel de calidad que se desee, combinando el esfuerzo de miles de editores voluntarios autoorganizados. Wikipedia es al mismo tiempo contenido y personas. Es el momento de conocerla por dentro y de potenciar su apuesta por el conocimiento abierto desde las instituciones culturales, científicas y educativas. Participar en Wikipedia permite aprender de este increíble laboratorio global de construcción social de información organizada. 0 0
Who Deletes Wikipedia? English 6 June 2012 0 0
Reverts Revisited: Accurate Revert Detection in Wikipedia Fabian Flöck
Denny Vrandečić
Elena Simperl
Hypertext and Social Media 2012 English June 2012 Wikipedia is commonly used as a proving ground for research in collaborative systems. This is likely due to its popularity and scale, but also to the fact that large amounts of data about its formation and evolution are freely available to inform and validate theories and models of online collaboration. As part of the development of such approaches, revert detection is often performed as an important pre-processing step in tasks as diverse as the extraction of implicit networks of editors, the analysis of edit or editor features and the removal of noise when analyzing the emergence of the con-tent of an article. The current state of the art in revert detection is based on a rather naïve approach, which identifies revision duplicates based on MD5 hash values. This is an efficient, but not very precise technique that forms the basis for the majority of research based on revert relations in Wikipedia. In this paper we prove that this method has a number of important drawbacks - it only detects a limited number of reverts, while simultaneously misclassifying too many edits as reverts, and not distinguishing between complete and partial reverts. This is very likely to hamper the accurate interpretation of the findings of revert-related research. We introduce an improved algorithm for the detection of reverts based on word tokens added or deleted to adresses these drawbacks. We report on the results of a user study and other tests demonstrating the considerable gains in accuracy and coverage by our method, and argue for a positive trade-off, in certain research scenarios, between these improvements and our algorithm’s increased runtime. 13 0
Wikipédia et les bibliothèques : dix ans après Rémi Mathis Bibliothèques 2.0 : à l'heure des médias sociaux French June 2012 Etat des lieux sur les rapports entre les bibliothèques et Wikipédia en 2012. 1 0
What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s) Nicolas Jullien Social Science Research Network English 7 May 2012 This article proposes a review of the literature analyzing Wikipedia as a collective system for producing knowledge. 279 1
Panorama of the wikimediasphere David Gómez-Fontanills Digithum English
Catalan
May 2012 The term wikimediasphere is proposed to refer to the group of WikiProjects, communities of editors, guidelines and organisations structured around the Wikimedia movement to generate free knowledge that is available to everyone. A description is made of the wikimediasphere, presenting the main projects and their characteristics, and its community, technological, regulatory, social and institutional dimensions are outlined. The wikimediasphere is placed in context and reference is made to its blurred boundaries. An explanation is provided of the role of the communities of editors of each project and their autonomy with respect to each other and to the Wikimedia Foundation. The author concludes by offering a panoramic view of the wikimediasphere. 10 0
The Truth of Wikipedia Nathaniel Tkacz Digithum English
Catalan
May 2012 What does it mean to assert that Wikipedia has a relation to truth? That there is, despite regular claims to the contrary, an entire apparatus of truth in Wikipedia? In this article, I show that Wikipedia has in fact two distinct relations to truth: one which is well known and forms the basis of existing popular and scholarly commentaries, and another which refers to equally well-known aspects of Wikipedia, but has not been understood in terms of truth. I demonstrate Wikipedia’s dual relation to truth through a close analysis of the Neutral Point of View core content policy (and one of the project’s “Five Pillars”). I conclude by indicating what is at stake in the assertion that Wikipedia has a regime of truth and what bearing this has on existing commentaries. 7 0
Using Wikipedia to develop language resources: WordNet 3.0 in Catalan and Spanish Antoni Oliver
Salvador Climent
Digithum English
Catalan
May 2012 We describe the state of the art in the use of Wikipedia for natural language processing tasks and also describe three applications of our own that enrich a powerful language resource: WordNet version 3.0 in Catalan and Spanish. Researchers have for many years sought applications that would take account of world knowledge in a more or less structured way, as this kind of knowledge has proven to be crucial to satisfactorily solving certain language processing tasks. Wikipedia may be the answer to the provision of this kind of information, as it is constantly updated and access is free. 17 0
Wikipedia's Role in Reputation Management: An Analysis of the Best and Worst Companies in the USA Marcia W. DiStaso
Marcus Messner
Digithum English
Catalan
May 2012 Being considered one of the best companies in the USA is a great honor, but this reputation does not exempt businesses from negativity in the collaboratively edited online encyclopedia Wikipedia. Content analysis of corporate Wikipedia articles for companies with the best and worst reputations in the USA revealed that negative content outweighed positive content irrespective of reputation. It was found that both the best and the worst companies had more negative than positive content in Wikipedia. This is an important issue because Wikipedia is not only one of the most popular websites in the world, but is also often the first place people look when seeking corporate information. Although there was more content on corporate social responsibility in the entries for the ten companies with the best reputations, this was still overshadowed by content referring to legal issues or scandals. Ultimately, public relations professionals need to regularly monitor and request updates to their corporate Wikipedia articles regardless of what kind of company they work for. 0 0
Edição colaborativa na Wikipédia: desafios e possibilidades Carlos Frederico de Brito d’Andréa Educação científica e cidadania: abordagens teóricas e metodológicas para a formação de pesquisadores juvenis Portuguese March 2012 14 0
Valorisation du bénévolat sur Wikipédia Vincent Juhel French February 2012 Wikipédia a un fonctionnement atypique dont les recherches s’attardent majoritairement autour de la qualité des articles potentiellement rédigés par n’importe qui. J’ai cherché par cette thèse professionnelle à présenter un regard quantitatif et qualitatif de la véritable valeur que ce projet apporte aux lecteurs, rédacteurs, donateurs mais également ce qu’il aurait représenté s’il avait été une entreprise classique. Le premier objectif était d’évaluer la valeur du travail de ces bénévoles, qui, en dépit sa gratuité, apporte une véritable richesse. Mieux définir cette richesse, c’est aussi mieux convaincre les donateurs et avoir plus de poids vis à vis des partenaires. Le deuxième objectif a été de définir les contours d’une stratégie cherchant à maximiser la valeur produite par une communauté de bénévoles en grande partie autogérée. Mieux maîtriser la valeur produite pour mieux orienter et motiver le travail des contributeurs. 0 0
A Breakdown of Quality Flaws in Wikipedia Maik Anderka
Benno Stein
2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 12) English 2012 The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing low-quality information, which is mainly attributed to its core policy of being open for editing by everyone. The identification of low-quality information is an important task since Wikipedia has become the primary source of knowledge for a huge number of people around the world. Previous research on quality assessment in Wikipedia either investigates only small samples of articles, or else focuses on single quality aspects, like accuracy or formality. This paper targets the investigation of quality flaws, and presents the first complete breakdown of Wikipedia's quality flaw structure. We conduct an extensive exploratory analysis, which reveals (1) the quality flaws that actually exist, (2) the distribution of flaws in Wikipedia, and (3) the extent of flawed content. An important finding is that more than one in four English Wikipedia articles contains at least one quality flaw, 70% of which concern article verifiability. 0 0
A Cross-Lingual Dictionary for English Wikipedia Concepts Valentin I. Spitkovsky
Angel X. Chang
Proceedings of the Eighth International Conference on Language Resources and Evaluation English 2012 We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal interoperability, we release our resource as a set of flat line-based text files, lexicographically sorted and encoded with UTF-8. These files capture joint probability distributions underlying concepts (we use the terms article, concept and Wikipedia URL interchangeably) and associated snippets of text, as well as other features that can come in handy when working with Wikipedia articles and related information. 5 0
A Wikipedia-based corpus reference tool Jason Ginsburg HCCE English 2012 This paper describes a dictionary-like reference tool that is designed to help users find information that is similar to what one would find in a dictionary when looking up a word, except that this information is extracted automatically from large corpora. For a particular vocabulary item, a user can view frequency information, part-of-speech distribution, word-forms, definitions, example paragraphs and collocations. All of this information is extracted automatically from corpora and most of this information is extracted from Wikipedia. Since Wikipedia is a massive corpus covering a diverse range of general topics, this information is probably very representative of how target words are used in general. This project has applications for English language teachers and learners, as well as for language researchers. 0 0
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia Weinan Zhang
Dingquan Wang
Gui-Rong Xue
Hongyuan Zha
ACM Trans. Intell. Syst. Technol. English 2012 0 0
Analysis of discussion contributions in translated Wikipedia articles Ari Hautasaari
Toru Ishida
English 2012 0 0
Bieber no more: First Story Detection using Twitter and Wikipedia Miles Osborne
Saša Petrović
Richard McCreadie
Craig Macdonald
Iadh Ounis
English 2012 Twitter is a well known source of information regarding breaking news stories. This aspect of Twitter makes it ideal for identifying events as they happen. However, a key problem with Twitter-driven event detection approaches is that they produce many spurious events, i.e., events that are wrongly detected or simply are of no interest to anyone. In this paper, we examine whether Wikipedia (when viewed

as a stream of page views) can be used to improve the quality of discovered events in Twitter. Our results suggest that Wikipedia is a powerful filtering mechanism, allowing for easy blocking of large numbers of spurious events. Our results also indicate that events within Wikipedia tend to lag

behind Twitter.
0 0
Biographical Social Networks on Wikipedia: A cross-cultural study of links that made history Pablo Aragón
Andreas Kaltenbrunner
David Laniado
Yana Volkovich
WikiSym English 2012 It is arguable whether history is made by great men and women or vice versa, but undoubtably social connections shape history. Analysing Wikipedia, a global collective memory place, we aim to understand how social links are recorded across cultures. Starting with the set of biographies in the English Wikipedia we focus on the networks of links between these biographical articles on the 15 largest language Wikipedias. We detect the most central characters in these networks and point out culture-related peculiarities. Furthermore, we reveal remarkable similarities between distinct groups of language Wikipedias and highlight the shared knowledge about connections between persons across cultures. 0 0
Breaking news on wikipedia: dynamics, structures, and roles in high-tempo collaboration Brian C. Keegan Computer-Supported Cooperative Work English 2012 0 0
Building a standpoints web to support decision-making in wikipedia Jodi Schneider Computer-Supported Cooperative Work English 2012 0 0
Circadian patterns of Wikipedia editorial activity: A demographic analysis Taha Yasseri
Róbert Sumi
János Kertész
PLoS ONE English 2012 Wikipedia (WP) as a collaborative, dynamical system of humans is an appropriate subject of social studies. Each single action of the members of this society, i.e. editors, is well recorded and accessible. Using the cumulative data of 34 Wikipedias in different languages, we try to characterize and find the universalities and differences in temporal activity patterns of editors. Based on this data, we estimate the geographical distribution of editors for each WP in the globe. Furthermore we also clarify the differences among different groups of WPs, which originate in the variance of cultural and social features of the communities of editors. 10 1
Classroom Wikipedia participation effects on future intentions to contribute Cliff Lampe
Jonathan Obar
Elif Ozkaya
Paul Zube
Alcides Velasquez
Computer-Supported Cooperative Work English 2012 One of the biggest challenges faced by social media sites like Wikipedia is how to motivate users to contribute content. Research continues to demonstrate that only a small percentage of users contribute to user-generated content sites. In this study we assess the results of a Wikimedia Foundation initiative, which had graduate and undergraduate students from 22 U.S. universities contribute content to Wikipedia articles as part of their coursework. 185 students were asked about their participation in the initiative and their intention to participate on Wikipedia in the future. Results suggest that intentions to continue contributing are influenced by the initial attitude towards the class, and the degree to which students perceived they were writing for a global audience. 7 0
Conflict, criticism, or confidence: an empirical examination of the gender gap in wikipedia contributions Benjamin Collier
Julia Bear
Computer-Supported Cooperative Work English 2012 A recent survey of contributors to Wikipedia found that less than 15% of contributors are women. This gender contribution gap has received significant attention from both researchers and the media. A panel of researchers and practitioners has offered several insights and opinions as to why a gender gap exists in contributions despite gender anonymity online. The gender research literature suggests that the difference in contribution rates could be due to three factors: (1) the high levels of conflict in discussions, (2) dislike of critical environments, and (3) lack of confidence in editing other contributors' work. This paper examines these hypotheses regarding the existence of the gender gap in contribution by using data from an international survey of 176,192 readers, contributors, and former contributors to Wikipedia, including measures of demographics, education, motivation, and participation. Implications for improving the design and culture of online communities to be more gender inclusive are discussed. 0 0
Do editors or articles drive collaboration?: multilevel statistical network analysis of wikipedia coauthorship Brian Keegan
Darren Gergle
Noshir Contractor
Computer-Supported Cooperative Work English 2012 0 0
Emotions and dialogue in a peer-production community: the case of Wikipedia David Laniado
Carlos Castillo
Andreas Kaltenbrunner
Mayo Fuster Morell
WikiSym English 2012 This paper presents a large-scale analysis of emotions in conversations among Wikipedia editors. Our focus is on the emotions expressed by editors in talk pages, measured by using the Affective Norms for English Words (ANEW).

We find evidence that to a large extent women tend to participate in discussions with a more positive tone, and that administrators are more positive than non-administrators. Surprisingly, female non-administrators tend to behave like administrators in many aspects.

We observe that replies are on average more positive than the comments they reply to, preventing many discussions from spiralling down into conflict. We also find evidence of emotional homophily: editors having similar emotional styles are more likely to interact with each other.

Our findings offer novel insights into the emotional dimension of interactions in peer-production communities, and contribute to debates on issues such as the flattening of editor growth and the gender gap.
0 0
Learning from history: predicting reverted work at the word level in wikipedia Jeffrey Rzeszotarski
Aniket Kittur
Computer-Supported Cooperative Work English 2012 0 0
Negotiating Cultural Values in Social Media: A Case Study from Wikipedia Jonathan T. Morgan
Robert M. Mason
Karine Nahon
HICSS English 2012 0 0
Network Centrality and Contributions to Online Public Good--The Case of Chinese Wikipedia Chong (Alex) Wang
Xiaoquan (Michael) Zhang
HICSS English 2012 0 0
Omnipedia: Bridging the Wikipedia Language Gap Patti Bao
Brent Hecht
Samuel Carton
Mahmood Quaderi
Michael Horn
Darren Gergle
International Conference on Human Factors in Computing Systems English 2012 We present Omnipedia, a system that allows Wikipedia readers to gain insight from up to 25 language editions ofWikipedia simultaneously. Omnipedia highlights the similarities and differences that exist among Wikipedia language editions, and makes salient information that is unique to each language as well as that which is shared more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with a multilingual Wikipedia experience. These include visualizing content in a language-neutral way and aligning data in the face of diverse information organization strategies. We present a study of Omnipedia that characterizes how people interact with information using a multilingual lens. We found that users actively sought information exclusive to unfamiliar language editions and strategically compared how language editions defined concepts. Finally, we briefly discuss how Omnipedia generalizes to other domains facing language barriers. 0 0
On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia Maik Anderka
Benno Stein
Matthias Busse
Wikipedia Academy English 2012 The improvement of information quality is a major task for the free online encyclopedia Wikipedia. Recent studies targeted the analysis and detection of specific quality flaws in Wikipedia articles. To date, quality flaws have been exclusively investigated in current Wikipedia articles, based on a snapshot representing the state of Wikipedia at a certain time. This paper goes further, and provides the first comprehensive breakdown of the evolution of quality flaws in Wikipedia. We utilize cleanup tags to analyze the quality flaws that have been tagged by the Wikipedia community in the English Wikipedia, from its launch in 2001 until 2011. This leads to interesting findings regarding (1) the development of Wikipedia's quality flaw structure and (1) the usage and the effectiveness of cleanup tags. Specifically, we show that inline tags are more effective than tag boxes, and provide statistics about the considerable volume of rare and non-specific cleanup tags. We expect that this work will support the Wikipedia community in making quality assurance activities more efficient. 0 0
Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia Maik Anderka
Benno Stein
CLEF English 2012 The paper overviews the task "Quality Flaw Prediction in Wikipedia" of the PAN'12 competition. An evaluation corpus is introduced which comprises 1,592,226 English Wikipedia articles, of which 208,228 have been tagged to contain one of ten important quality flaws. Moreover, the performance of three quality flaw classifiers is evaluated. 0 0
Predicting Quality Flaws in User-generated Content: The Case of Wikipedia Maik Anderka
Benno Stein
Nedim Lipka
35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) English 2012 The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. 0 0
Supporting collaboration in Wikipedia between language communities Ranjitha Gurunath Kulkarni
Gaurav Trivedi
Tushar Suresh
Miaomiao Wen
Zeyu Zheng
Carolyn Rose
English 2012 0 0
The people's encyclopedia under the gaze of the sages: a systematic review of scholarly research on Wikipedia Chitu Okoli
Mohamad Mehdi
Mostafa Mesgari
Finn Årup Nielsen
Arto Lanamäki
English 2012 Wikipedia has become one of the ten most visited sites on the Web, and the world’s leading source of Web reference information. Its rapid success has inspired hundreds of scholars from various disciplines to study its content, communication and community dynamics from various perspectives. This article presents a systematic review of scholarly research on Wikipedia. We describe our detailed, rigorous methodology for identifying over 450 scholarly studies of Wikipedia. We present the WikiLit website (http wikilit dot referata dot com), where most of the papers reviewed here are described in detail. In the major section of this article, we then categorize and summarize the studies. An appendix features an extensive list of resources useful for Wikipedia researchers. 15 0
There is No Deadline - Time Evolution of Wikipedia Discussions Andreas Kaltenbrunner
David Laniado
WikiSym English 2012 Wikipedia articles are by definition never finished: at any moment their content can be edited, or discussed in the associated talk pages. In this study we analyse the evolution of these discussions to unveil patterns of collective participation along the temporal dimension, and to shed light on the process of content creation on different topics. At a micro-scale, we investigate peaks in the discussion activity and we observe a non-trivial relationship with edit activity. At a larger scale, we introduce a measure to account for how fast discussions grow in complexity, and we find speeds that span three orders of magnitude for different articles. Our analysis should help the community in tasks such as early detection of controversies and assessment of discussion maturity. 0 0
Wikidata: a new platform for collaborative data collection Denny Vrandečić International conference companion on World Wide Web English 2012 This year, Wikimedia starts to build a new platform for the collaborative acquisition and maintenance of structured data: Wikidata. Wikidata's prime purpose is to be used within the other Wikimedia projects, like Wikipedia, to provide well-maintained, high-quality data. The nature and requirements of the Wikimedia projects require to develop a few novel, or at least unusual features for Wikidata: Wikidata will be a secondary database, i.e. instead of containing facts it will contain references for facts. It will be fully internationalized. It will contain inconsistent and contradictory facts, in order to represent the diversity of knowledge about a given entity. 0 0
Wikipedia Lover, Not a Hater: Harnessing Wikipedia to Increase the Discoverability of Library Resources Danielle Elder
R. Niccole Westbrook
Michele Reilly
Journal of Web Librarianship English 2012 During the spring of 2010, the University of Houston Libraries Digital Services Department began an initiative to promote existing and upcoming collections in the University of Houston Digital Library and drive traffic to the online repository. Spurred by an OCLC report (De Rosa et al. 2005) that only two percent of college and university students began research by consulting library resources, University of Houston Digital Services staff sought to add content from the University of Houston Digital Library to Wikipedia in order to insert primary source digital materials into the research workflow of students and faculty. As a result, referrals from Wikipedia to the University of Houston Digital Library have increased significantly and the pilot project is now the basis for an ongoing University of Houston Digital Services program. The structure and direction of the pilot project were a collaborative effort between University of Houston Digital Services staff and a University of North Texas Library and Information Science intern participating in the University of Houston Digital Services Digital Library Internship Program. Through this case study the authors cover the evolution of the University of Houston Digital Services Wikipedia pilot project and its growth into a permanent program. The authors also outline the workflows and procedures of the project and describes in detail the challenges and successes of the pilot Wikipedia project at University of Houston Digital Services. Included are lessons learned for libraries and cultural institutions interested in establishing a similar program. 0 1
Is Wikipedia Inefficient? Modelling Effort and Participation in Wikipedia Kevin Crowston
Nicolas Jullien
Felipe Ortega
HICSS 2013 English 17 November 2011 Concerns have been raisedabout the decreased ability of Wikipedia to recruit editors and in to harness the effort of contributors to create new articles and imp 0 0
Accuracy and completeness of drug information in Wikipedia: an assessment Natalie Kupferberg
Bridget McCrate Protus
Journal of the Medical Library Association English October 2011 0 1
Autonomous Link Spam Detection in Purely Collaborative Environments Andrew G. West
Avantika Agrawal
Phillip Baker
Brittney Exline
Insup Lee
WikiSym English October 2011 Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis.

Recent research has exposed vulnerabilities in Wikipedia's link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriers-to-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination).

In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using "wiki" metadata, landing site analysis, and external data sources. The resulting classifier attains 64% recall at 0.5% false-positives (ROC-AUC=0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed.
0 0
What Wikipedia Deletes: Characterizing Dangerous Collaborative Content Andrew G. West
Insup Lee
WikiSym English October 2011 Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply ``undone -- but *deleted* from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied. 0 0
Entre o agrupamento e a comunidade virtual: colaboração e conflitos na edição das biografias dos jogadores “Adriano” e “Ronaldo” na Wikipédia em português Carlos Frederico de Brito d’Andréa XXXIV Congresso Brasileiro de Ciências da Comunicação Portuguese September 2011 9 0
Link Spamming Wikipedia for Profit Andrew G. West
Jian Chang
Krishna Venkatasubramanian
Oleg Sokolsky
Insup Lee
CEAS '11: Proc. of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference English September 2011 Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks.

Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize *exposure*, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement.

Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies.
0 0
Processos editoriais auto-organizados na Wikipédia em português: a edição colaborativa de "Biografias de Pessoas Vivas" Carlos Frederico de Brito d’Andréa Portuguese September 2011 This dissertation maps and analyzes the dynamics of editions in a sample of articles of the Portuguese version of Wikipedia. We identify and discuss the self-organized and collaborative processes in its editorial network, as well as how the editors rewrite the articles over time. This research begins with conceptual considerations about the “encyclopedia that anyone can edit”, focusing on trends of the Portuguese version and specifically on the “Biographies of Living People”, which are characterized by the possibility of including, “in real time”, factual information about the life and work of influent people. The theoretical framework is composed by authors from different areas. In Text Linguistics, we discuss the concepts of text (BEAUGRANDE, 1997; COSCARELLI, 2006), textuality (COSTA VAL, 2004), retextualization and rewritting (DELL’ISOLA, 2007; MARCUSCHI, 2000; MATENCIO, 2002). Besides that, we discuss the editorial processes and professional activities (like copy editing) in the “production networks” of books and encyclopedias, especially after the use of digital technologies. In chapter 3, we discuss the networked editorial production based on the internet and inspired in “hacker culture” and “open source softwares”. In this context, the most important concepts are “commonbased peer production” (BENKLER, 2006), “The Wisdom of Crowds” (SUROWIECKI, 2007), “produsage” (BRUNS, 2008), “virtual community” e “crowdsourcing” (HAYTHORNTHWAITE, 2009). We also present the relationships between this new model and traditional editorial processes, like “networked book” and “wiki-journalism”. After that, we relate networked editorial production with complexity paradigm and discuss Wikipedia as a complex adaptive system (HOLLAND, 1995; LARSEN-FREEMAN e CAMERON, 2008) that, potentially, works in a self-organized and emergent dynamics (DEBRUN, 1996a, 1996b; DE WOLF e HOLVOET, 2005). The empirical study of this thesis is based in 91 “Biographies of Living People” about most influential Brazilian personalities in the year of 2009 according two national magazines (“Época” and “Isto É”). In the quantitative phase of this work, we extracted data in articles history pages using a software (WikipediAnalyserPT) developed for this research. After making statistical analyses, we compared the edition processes of these articles using variables as “total of editions”, “editions made by groups of editors” (registered, non-registered, administrators and bots), “protections”, “reversions” etc. At the qualitative stage, we detail the dynamics of edition of five of articles and analyze the rewrittings of the texts and the interactions between the editors. Three articles were chosen because the “key variables” are very similar: the biographies of “Franklin Martins” (a journalist that worked in president Lula's government), “Kátia Abreu” (a senator known for defending owners of very large land areas) and “Ricardo Teixeira” (a president of the Brazilian Football Confederation). After that, we analyze the dynamics of two of the most edited articles of the sample: the biographies about the famous soccer players "Adriano Leite Ribeiro" (nicknamed "The Emperor") and "Ronaldo Nazario of Lima (also known as "The Phenomenon"). In the three intermediate articles, we identified a relative stability (caused by a few number of editions monthly) interspersed with short periods of time with more editions and disputes. We also observed that a few editors made almost all the “important” editions. In the two more edited biographies, we noticed an uninterrupted movement of the editors, hundreds of vandalisms and many war editions. Although also in these articles only a few editions are preserved, we identify an “emergence” pattern characterized by disputes that encourage the collaboration among agents. At the conclusion, we discuss the possibilities and challenges of a “wikification” of editorial processes. 60 0
To Wiki or Not to Wiki? Lori Byrd Phillips Museum English September 2011 0 0
Wiki: Escrita colaborativa Ana Elisa Costa Novais
Ana Elisa Ferreira Ribeiro
Carlos Frederico de Brito d’Andréa
Presença Pedagógica Portuguese September 2011 2 0
Coerência entre princípios e práticas na Wikipédia Lusófona: uma análise semiótica Paulo Henrique Souto Maior Serrano Portuguese July 2011 This paper presents the method, the analysis and the results of a study that examined the operation dynamics and consistency between the guidelines of conduct and practice of editing at the Lusophone version of Wikipedia, the free encyclopedia. This work uses information and content published under the Creative Commons / Share alike 3.0 that indicates the need to distribute the resulting work under the same license. The online encyclopedia can be freely changed by users that browse its contents. Discussions on the permanence or alteration of information published are held in a special discussion page where people can argue about the differences of opinion and reach consensus. This process occurs from cognitive and pragmatic sanctions given to themes and figures that make up the thematic isotopy of users enunciation. The identification of these elements in this dissertation, was carried out by Greimas' semiotics. Sanctions should pragmatically represent the guidelines of the collaborative process on Wikipedia, but there are institutionalized rules that are presented to users as the five pillars of Wikipedia. The five pillars are about the encyclopedism, neutral point of view, free license, community conviviality and liberality in the rules. The statute assigns values to the practice of encyclopedias and information that are published by them. These values were defined by tensive semiotics and compared with the cognitive and pragmatic sanctions of the isotopies enunciated by users, to check the consistency between what is being requested by Wikipedia and what is being done by their contributors. The results of this comparison show some similarities and differences between discourse and practice, indicating ownership of Wikipedia by its users and the need for more accuracy and criteria in conflicting issues or controversies for the permanence of information on the page entry. The verifiability of the information was presented as a greatly appreciated theme by users, indicating the importance of the veracity of reference sources and the verification of information. The freedoms and distribution of powers introduced by the principles are denied on the practice of editing. Wikipedia presented itself as a very liberal and tolerant encyclopedia, giving substance to the collaboration, but, in practice, very restrictive and careful when it comes to the permanence of a content in the article page. 4 1
Factual accuracy and trust in information: The role of expertise Teun Lucassen
Jan Maarten Schraagen
Journal of the American Society for Information Science and Technology English July 2011 In the past few decades, the task of judging the credibility of information has shifted from trained professionals (e.g., editors) to end users of information (e.g., casual Internet users). Lacking training in this task, it is highly relevant to research the behavior of these end users. In this article, we propose a new model of trust in information, in which trust judgments are dependent on three user characteristics: source experience, domain expertise, and information skills. Applying any of these three characteristics leads to different features of the information being used in trust judgments; namely source, semantic, and surface features (hence, the name 3S-model). An online experiment was performed to validate the 3S-model. In this experiment, Wikipedia articles of varying accuracy (semantic feature) were presented to Internet users. Trust judgments of domain experts on these articles were largely influenced by accuracy whereas trust judgments of novices remained mostly unchanged. Moreover, despite the influence of accuracy, the percentage of trusting participants, both experts and novices, was high in all conditions. Along with the rationales provided for such trust judgments, the outcome of the experiment largely supports the 3S-model, which can serve as a framework for future research on trust in information. 0 0
Wiki Readers Wiki Writers Thomas W. Reynolds Jr English July 2011 In 1995, the first wiki website, Ward Cunningham's Wiki WikiWeb, went public for the use of a community of computer programmers, and few outside of that community and those working in similar fields would have imagined wiki technology, a technology that allows visitors to a wiki-based web site to modify its structure and content. Fifteen years later, however, wiki comes to compositionists an already-loaded term. The mainstream media depicts wiki as a challenge to the ways we think about who writes and disseminates information, the nature of information itself, and who reads and how they read and use that information. At the same time, scholarship in the field of composition studies claims wiki as a writing tool that evidences and provides the process-centered, collaborative, democratized space for which researchers and teachers of writing have been looking. In both cases, the literature constructs ideas about what it means to be a writer and a reader in relation to wiki so that compositionists encounter wiki technology as always already described and defined. I analyze these oppositional perspectives on wiki technology and make it possible to move through, before, and beyond these constructions of readers and writers and the intellectual traditions through which they are made possible to make space for other readings of wiki technology and answer the following questions: How are the traditional roles of reader and writer articulated or challenged in the discourse surrounding wiki technology? How are the roles of readers and writers made possible through applications of wiki technology? I analyze the discourse surrounding wiki technology and then the writer and reader functions made possible in three wiki applications: Wikipedia, Scholarpedia, and Citizendium. It is the argument of this project that wiki makes visible and explicit the ways in which readers and writers have always already interacted, or at least desired to interact, providing a deeper and different understanding of the roles assumed by and constructed for readers and writers, an understanding that is situated within, without, and in the margins of the traditions that have always already constructed them (and wiki technology) differently. 32 0
Wikipédia e enciclopédia britânica: Informação confiável? Aline Luli Romero Ribeiro
Cláudio Gottschalg-Duque
Revista Brasileira de Biblioteconomia e Documentação Portuguese July 2011 Este artigo apresenta os resultados obtidos em um trabalho acadêmico que estudou a confiabilidade das informações das duas obras de referência, no formato digital e em língua inglesa, Wikipédia e Enciclopédia Britânica, dentro da área de Biblioteconomia, por meio da avaliação de verbetes semelhantes. Com o intuito de determinar o nível de confiabilidade de cada uma destas Enciclopédias e com base nos conceitos de Arquitetura da Informação, pretende-se analisar se a proibição da citação da Wikipédia no ambiente acadêmico faz-se justificada. 11 0
An Introductory Historical Contextualization of Online Creation Communities for the Building of Digital Commons: The Emergence of a Free Culture Movement Mayo Fuster Morell Proceedings of the 6th Open Knowledge Conference English June 2011 Online Creation Communities (OCCs) are a set of individuals that communicate, interact and collaborate; in several forms and degrees of participation which are eco-systemically integrated; mainly via a platform of participation on the Internet, on which they depend; and aiming at knowledge-making and sharing. The paper will first provide an historical contextualization OCCs. Then, it will show how the development of OCCs is fuelled by and contributes to, the rise of a free culture movement defending and advocating the creation of digital commons, and provide an empirically grounded definition of free culture movement. The empirical analyses is based content analysis of 80 interviews to free culture practitioners, promoters and activists with an international background or rooted in Europe, USA and Latino-America and the content analysis of two seminar discussions. The data collection was developed from 2008 to 2010. 0 0
Towards a diversity-minded Wikipedia Fabian Flöck
Denny Vrandečić
Elena Simperl
WebSci Conference English June 2011 Wikipedia is a top-ten Web site providing a free encyclopedia created by an open community of volunteer contributors. As investigated in various studies over the past years, contributors have different backgrounds, mindsets and biases; however, the effects - positive and negative - of this diversity on the quality of the Wikipedia content, and on the sustainability of the overall project are yet only partially understood. In this paper we discuss these effects through an analysis of existing scholarly literature in the area and identify directions for future research and development; we also present an approach for diversity-minded content management within Wikipedia that combines techniques from semantic technologies, data and text mining and quantitative social dynamics analysis to create greater awareness of diversity-related issues within theWikipedia community, give readers access to indicators and metrics to understand biases and their impact on the quality of Wikipedia articles, and support editors in achieving balanced versions of these articles that leverage the wealth of knowledge and perspectives inherent to large-scale collaboration. 24 1
Wikipedia & Research: The innovative character of Wikipedia research and the new challenges (and opportunities) associated with it Mayo Fuster Morell Proceedings of the 6th Open Knowledge Conference English June 2011 The workshop will focus on addressing the stage of Wikipedia research and in general common - based peer production (less focused on the content than on the methodologies and research process itself) and the innovations, problems and new insights regarding (action) research on common-based peer production. 0 0
Evaluating WikiTrust: A trust support tool for Wikipedia Teun Lucassen
Jan Maarten Schraagen
First Monday English May 2011 Because of the open character of Wikipedia readers should always be aware of the possibility of false information. WikiTrust aims at helping readers to judge the trustworthiness of articles by coloring the background of less trustworthy words in a shade of orange. In this study we look into the effects of such coloring on reading behavior and trust evaluation by means of an eye–tracking experiment. The results show that readers had more difficulties reading the articles with coloring than without coloring. Trust in heavily colored articles was lower. The main concern is that the participants in our experiment rated usefulness of WikiTrust low. 7 0
Hackers, Cyborgs, and Wikipedians: The Political Economy and Cultural History of Wikipedia Andrew A. Famiglietti English May 2011 This dissertation explores the political economy and cultural history of Wikipedia, the free encyclopedia. It demonstrates how Wikipedia, an influential and popular site of knowledge production and distribution, was influenced by its heritage from the hacker communities of the late twentieth century. More specifically, Wikipedia was shaped by an ideal I call, “the cyborg individual,” which held that the production of knowledge was best entrusted to a widely distributed network of individual human subjects and individually owned computers. I trace how this ideal emerged from hacker culture in response to anxieties hackers experienced due to their intimate relationships with machines. I go on to demonstrate how this ideal influenced how Wikipedia was understood both those involved in the early history of the site, and those writing about it. In particular, legal scholar Yochai Benkler seems to base his understanding of Wikipedia and its strengths on the cyborg individual ideal. Having established this, I then move on to show how the cyborg individual ideal misunderstands Wikipedia's actual method of production. Most importantly, it overlooks the importance of how the boundaries drawn around communities and shared technological resources shape Wikipedia's content. I then proceed to begin the process of building what I believe is a better way of understanding Wikipedia, by tracing how communities and shared resources shape the production of recent Wikipedia articles. 70 0
La dimensió de les llengües a la Wikipedia i la seua relació amb els elements socials Borja Pellejero
Natxo Sorolla
Marina Nogué
Digithum Catalan May 2011 There would seem to be a contradiction in the fact that Catalan should have a Wikipedia with a similar number of pages to that in Chinese. There are fewer than ten million Catalan speakers, and they were marginalised in their own land for a long time, but they have still been able to produce content on the internet that in some cases matches that of China, a world economic superpower with nearly one billion Chinese speakers. Though it should be noted that the situation is not the same in China as it is in those places where Catalan is spoken. This article offers an initial look at the social, educational, technological, economic and demographic factors linked to a language’s position in the ranking of number of Wikipedia articles. This analysis is based on one key concept, that of the digital language community, and the observation that Catalan’s position on the internet is not due to the activism of its speakers, but to a position that resembles that of any other medium-sized language community. Hi ha un aparent contrasentit en el fet que el català tinga a la Wikipedia un nombre d’articles similar al xinès. Una comunitat que no arriba a deu milions de catalanoparlants, llargament minoritzada al propi territori, pot arribar a tenir una capacitat de producció a internet que en alguns casos és assimilable a la de la Xina, que, amb prop de mil milions de parlants de xinès, és una superpotència econòmica mundial. Els més àvids matisaran que la situació no és la mateixa a la Xina que als territoris de llengua catalana. Aquest text vol fer una primera aproximació a quins són aquests factors socials, educatius, tecnològics, econòmics i demogràfics que estan relacionats amb la posició d’una llengua en el rànquing del nombre d’articles a la Wikipedia. D’aquesta anàlisi naix un concepte clau, el de la comunitat lingüística digital, i l’observació que la posició del català en el món d’internet no es deu a un pretès activisme dels seus parlants, sinó més aviat a una posició força semblant a la d’altres comunitats lingüístiques de demografia mitjana. 8 0
A Characterization of Wikipedia Content Based on Motifs in the Edit Graph Guangyu Wu
Martin Harrigan
Pádraig Cunningham
SMUC '11: Proceedings of the 3rd international workshop on Search and mining user-generated contents English February 2011 Good Wikipedia articles are authoritative sources due to the collaboration of a number of knowledgeable contributors. This is the many eyes idea. The edit network associated with a Wikipedia article can tell us something about its quality or authoritativeness. In this paper we explore the hypothesis that the characteristics of this edit network are predictive of the quality of the corresponding article's content. We characterize the edit network using a profile of network motifs and we show that this network motif profile is predictive of the Wikipedia quality classes assigned to articles by Wikipedia editors. We further show that the network motif profile can identify outlier articles particularly in the 'Featured Article' class, the highest Wikipedia quality class. 8 0
Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features B. Thomas Adler
Luca de Alfaro
Santiago M. Mola Velasco
Paolo Rosso
Andrew G. West
Lecture notes in computer science English February 2011 Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions. 0 1
A Research for the Centrality of Article Edit Collective in Wikipedia Dongjie Zhao
Haitao Yang
Jian Jiang
Deyi Li
Haisu Zhang
ICM English 2011 0 0
A generalized method for word sense disambiguation based on wikipedia Chenliang Li
Aixin Sun
Anwitaman Datta
ECIR English 2011 0 0
A gripe suína na Wikipédia em português: análise da dinâmica de edições e qualificação do conteúdo de dois artigos Bernardo Esteves Gonçalves da Costa
Carlos Frederico de Brito d’Andréa
Intexto Portuguese January 2011 This article intends to analyze and compare the collaborative edition of two articles about pandemic influenza A (H1N1) — or swine flu — in the Portuguese-language edition of Wikipedia. We have monitored the edits made in those articles during one month after they were created on April 25, 2009. We have characterized the edition of the texts and the dynamics of interactions among the editors. Additionally, we have analyzed their contents according to three criteria: authority, verifiability and timeliness. 11 0
A link-based visual search engine for Wikipedia David N. Milne
Ian H. Witten
JCDL English 2011 0 0
A multimethod study of information quality in wiki collaboration Gerald C. Kane ACM Trans. Manage. Inf. Syst. English 2011 0 0
A quantitative examination of the impact of featured articles in Wikipedia Antonio J. Reinoso
Jesús M. González-Barahona
Rocío Muñoz Mansilla
Israel Herraiz
ICSOFT English 2011 This paper presents a quantitative examination of the impact of the presentation of featured articles as quality content in the main page of several Wikipedia editions. Moreover, the paper also presents the analysis performed to determine the number of visits received by the articles promoted to the featured status. We have analyzed the visits not only in the month when articles awarded the promotion or were included in the main page, but also in the previous and following ones. The main aim for this is to assess the attention attracted by the featured content and the different dynamics exhibited by each community of users in respect to the promotion process. The main results of this paper are twofold: it shows how to extract relevant information related to the use of Wikipedia, which is an emerging research topic, and it analyzes whether the featured articles mechanism achieve to attract more attention. 3 0
AVBOT: Detecting and fixing vandalism in Wikipedia Emilio J. Rodríguez-Posada UPGRADE English 2011 Wikipedia is a project which aims to build a free encyclopaedia to spread the sum of all knowledge to every single human being. Today it can be said to be on the road to achieving that goal, having reached the 15 million articles milestone in 270 languages. Furthermore, if we include its sister projects (Wiktionary, Wikibooks, Wikisource,...), it has received more than 1 billion edits in 10 years and now has more than 10 billion page views every month. Compiling an encyclopaedia in a collaborative way has been possible thanks to MediaWiki software. It allows everybody to modify the content available on the site easily. But a problem emerges regarding this model: not all edits are made in good faith. AVBOT is a bot for protecting the Spanish Wikipedia against some undesired modifications known as vandalism. Although AVBOT was developed for Wikipedia, it can be used on any MediaWiki website. It is developed in Python and is free software. In the 2 years it has been in operation it has reverted more than 200,000 vandalism edits, while several clones have been executed, adding thousands of reverts to this count. 0 0
An exploratory study of navigating wikipedia semantically: model and application I-Chin Wu
Yi-Sheng Lin
Che-Hung Liu
OCSC English 2011 0 0
Automatically assigning Wikipedia articles to macro-categories Jacopo Farina
Riccardo Tasso
David Laniado
Hypertext English 2011 The online encyclopedia Wikipedia offers millions of articles which are organized in a hierarchical category structure, created and updated by users. In this paper we present a technique which leverages this rich and disordered graph to assign each article to one or more topics. We modify an existing approach, based on the shortest paths between categories, in order to account for the direction of the hierarchy. 0 0
Autopedia: automatic domain-independent Wikipedia article generation Conglei Yao
Xu Jia
Sicong Shou
Shicong Feng
Feng Zhou
Hongyan Liu
World Wide Web English 2011 0 0
Bancos de imágenes para proyectos enciclopédicos: el caso de Wikimedia Commons Tomás Saorín-Pérez
Juan-Antonio Pastor-Sánchez
El profesional de la información Spanish 2011 This paper presents the characteristics and functionalities of the Wikimedia Commons image databank shared by all Wikipedia projects. The process of finding images and ilustrating Wikipedia articles is also explained, along with how to add images to the bank. The role of cultural institutions in promoting free and open cultural heritage content is highlighted. Se presenta la naturaleza y función del banco de imágenes Wikimedia Commons para los proyectos de enciclopedias colaborativas. Se analiza el proceso de localización de imágenes y su uso para ilustrar un artículo en Wikipedia, así como la colaboración incorporando imágenes al banco. Se hace especial referencia a las políticas de liberación de patrimonio cultural desde las instituciones culturales. 5 1
Bootstrapping Multilingual Relation Discovery Using English Wikipedia and Wikimedia-Induced Entity Extraction Patrick Schone
Tim Allison
Chris Giannella
Craig Pfeifer
ICTAI English 2011 0 0
Building a signed network from interactions in Wikipedia Silviu Maniu
Bogdan Cautis
Talel Abdessalem
DBSocial English 2011 0 1
Casting a web of trust over Wikipedia: an interaction-based approach Silviu Maniu
Talel Abdessalem
Bogdan Cautis
World Wide Web English 2011 0 0
Characterization and prediction of Wikipedia edit wars Róbert Sumi
Taha Yasseri
András Rung
András Kornai
János Kertész
WebSci Conference English 2011 We present a new, eficient method for automatically detecting conict cases and test it on five diferent language Wikipedias. We discuss how the number of edits, reverts, the length of discussions deviate in such pages from those following the general workow. 4 2
Characterizing Wikipedia pages using edit network motif profiles Guangyu Wu
Martin Harrigan
Pádraig Cunningham
SMUC English 2011 Good Wikipedia articles are authoritative sources due to the collaboration of a number of knowledgeable contributors. This is the many eyes idea. The edit network associated with a Wikipedia article can tell us something about its quality or authoritativeness. In this paper we explore the hypothesis that the characteristics of this edit network are predictive of the quality of the corresponding article's content. We characterize the edit network using a profile of network motifs and we show that this network motif profile is predictive of the Wikipedia quality classes assigned to articles by Wikipedia editors. We further show that the network motif profile can identify outlier articles particularly in the 'Featured Article' class, the highest Wikipedia quality class. 0 0
Co-authorship 2.0: patterns of collaboration in Wikipedia David Laniado
Riccardo Tasso
Hypertext English 2011 The study of collaboration patterns in wikis can help shed light on the process of content creation by online communities. To turn a wiki's revision history into a collaboration network, we propose an algorithm that identifies as authors of a page the users who provided the most of its relevant content, measured in terms of quantity and of acceptance by the community. The scalability of this approach allows us to study the English Wikipedia community as a co-authorship network. We find evidence of the presence of a nucleus of very active contributors, who seem to spread over the whole wiki, and to interact preferentially with inexperienced users. The fundamental role played by this elite is witnessed by the growing centrality of sociometric stars in the network. Isolating the community active around a category, it is possible to study its specific dynamics and most influential authors. 0 1
Collaborative Wikipedia Hosting English
Dutch
2011 0 0
Collective memory building in Wikipedia: The case of North African uprisings Michela Ferron
Paolo Massa
WikiSym English 2011 Since December 2010, a series of protests and uprisings have shocked North African countries such as Tunisia, Egypt, Libya, Syria, Yemen and more. In this paper, focusing mainly on the Egyptian revolution, we provide evidence of the intense edit activity occurred during these uprisings on the related Wikipedia pages. Thousands of people provided their contribution on the content pages and discussed improvements and disagreements on the associated talk pages as the traumatic events unfolded. We propose to interpret this phenomenon as a process of collective memory building and argue how on Wikipedia this can be studied empirically and quantitatively in real time. We explore and suggest possible directions for future research on collective memory formation of traumatic and controversial events in Wikipedia. 14 0
Conceptual Indexing of Documents Using Wikipedia Carlo Abi Chahine
Nathalie Chaignaud
Jean-Philippe Kotowicz
Jean-Pierre Pecuchet
WI-IAT English 2011 0 0
Credibility Assessment Using Wikipedia for Messages on Social Network Services Yu Suzuki
Akiyo Nadamoto
DASC English 2011 0 0
Cross lingual text classification by mining multilingual topics from wikipedia Xiaochuan Ni
Jian T. Sun
Jian Hu
Zheng Chen
WSDM English 2011 0 0
Design and implementation of the Sweble Wikitext parser: unlocking the structured data of Wikipedia Hannes Dohrn
Dirk Riehle
WikiSym English 2011 0 0
Detection of Text Quality Flaws as a One-class Classification Problem Maik Anderka
Benno Stein
Nedim Lipka
20th ACM Conference on Information and Knowledge Management (CIKM 11) English 2011 For Web applications that are based on user generated content the detection of text quality flaws is a key concern. Our research contributes to automatic quality flaw detection. In particular, we propose to cast the detection of text quality flaws as a one-class classification problem: we are given only positive examples (= texts containing a particular quality flaw) and decide whether or not an unseen text suffers from this flaw. We argue that common binary or multiclass classification approaches are ineffective in here, and we underpin our approach by a real-world application: we employ a dedicated one-class learning approach to determine whether a given Wikipedia article suffers from certain quality flaws. Since in the Wikipedia setting the acquisition of sensible test data is quite intricate, we analyze the effects of a biased sample selection. In addition, we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. Altogether, provided test data with little noise, four from ten important quality flaws in Wikipedia can be detected with a precision close to 1. 0 0
Discovering context: classifying tweets through a semantic transform based on wikipedia Yegin Genc
Yasuaki Sakamoto
Jeffrey V. Nickerson
FAC English 2011 0 0
Discussion about Translation in Wikipedia Ari Hautasaari
Toru Ishida
CULTURE-COMPUTING English 2011 0 0
Document Indexing and Retrieval Using Wikipedia Carlo Abi Chahine
Nathalie Chaignaud
Jean-Philippe Kotowicz
Jean-Pierre Pecuchet
SITIS English 2011 0 0
Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work Aaron Halfaker
Aniket Kittur
John Riedl
WikiSym English 2011 Reverts are important to maintaining the quality of Wikipedia. They fix mistakes, repair vandalism, and help enforce policy. However, reverts can also be damaging, especially to the aspiring editor whose work they destroy. In this research we analyze 400,000 Wikipedia revisions to understand the effect that reverts had on editors. We seek to understand the extent to which they demotivate users, reducing the workforce of contributors, versus the extent to which they help users improve as encyclopedia editors. Overall we find that reverts are powerfully demotivating, but that their net influence is that more quality work is done in Wikipedia as a result of reverts than is lost by chasing editors away. However, we identify key conditions – most specifically new editors being reverted by much more experienced editors – under which reverts are particularly damaging. We propose that reducing the damage from reverts might be one effective path for Wikipedia to solve the newcomer retention problem. 0 0
Effectively mining wikipedia for clustering multilingual documents N. Kiran Kumar
G. S. K. Santosh
Vasudeva Varma
NLDB English 2011 0 0
Enabling type/condition-specified entity/fact retrieval using semantic knowledge extracted from wikipedia Sofia J. Athenikos
Xia Lin
SMER English 2011 0 0
Examining the "leftness" property of Wikipedia categories Karl Gyllstrom
Marie-Francine Moens
CIKM English 2011 0 0
Exploring Wikipedia with HMpara David N. Milne
Ian H. Witten
JCDL English 2011 0 0
Exploring linguistic points of view of Wikipedia Paolo Massa
Federico Scrinzi
WikiSym English 2011 The 3 million articles of the English Wikipedia has been written since 2011 by more than 14 million volunteers. On each article, the community of editors strive to reach a neutral point of view, representing all significant views fairly, proportionately, and without bias. However, beside the English one, there are more than 270 Wikipedias in different languages and their relatively isolated communities of editors are not forced by the platform to discuss and negotiate their points of view. So the empirical question is: do communities on different languages editions of Wikipedia develop their own diverse Linguistic Points of View (LPOV)? To answer this question we created Manypedia, a web tool whose goal is to ease cross-cultural comparisons of Wikipedia language communities by analyzing their different representations of the same topic. 0 1
Exploring wiki: measuring the quality of social media using ant colony metaphor Soumya Banerjee
Nashwa El-Bendary
Hameed Al-Qaheri
MEDES English 2011 0 0
Exploring wikipedia's category graph for query classification Milad Alemzadeh
Richard Khoury
Fakhri Karray
AIS English 2011 0 0
Extracción de Corpus Paralelos de la Wikipedia basada en la Obtención de Alineamientos Bilingües a Nivel de Frase Joan Albert Silvestre-Cerdà
Mercedes García-Martínez
Alberto Barrón-Cedeño
Jorge Civera
Paolo Rosso
Proceedings of the Workshop on Iberian Cross-Language Natural Language Processing Tasks (ICL 2011) Spanish 2011 This paper presents a proposal for extracting parallel corpora from Wikipedia on the basis of statistical machine translation techniques. We have used word-level alignment models from IBM in order to obtain phrase-level bilingual alignments between documents pairs. We have manually annotated a set of test English-Spanish comparable documents in order to evaluate the model. The obtained results are encouraging. 4 0
Gender differences in Wikipedia editing Judd Antin
Raymond Yee
Coye Cheshire
Oded Nov
WikiSym English 2011 As Wikipedia has become an indispensable source of online information, concerns about who writes, edits, and maintains it have come to the forefront. In particular, the 2010 UNU-MERIT survey found evidence of a significant gender skew: fewer than 13% of Wikipedia contributors are women. However, the number of contributors is just one way to examine gender differences in contribution. In this paper we take a more fine-grained perspective by examining how much and what types of Wiki-work men and women tend to do. First, we find that the so-called “Gender Gap” in number of editors may not be as wide as prior studies have suggested. Second, although more than 80% of editors in our sample were men, among the bottom 75% of editors by activity level, we find that men and women made similar numbers of revisions. However, among the most active Wikipedians men tended to make many more revisions than women. Finally, we find that the most active women in our sample tended to make larger revisions than the most active men. We conclude by discussing directions for future research. 0 0
Graph-based named entity linking with wikipedia Ben Hachey
Will Radford
James R. Curran
WISE English 2011 0 0
GreenWiki: a tool to support users' assessment of the quality of Wikipedia articles Daniel Hasan Dalip
Raquel Lara Santos
Diogo Rennó Oliveira
Valéria Freitas Amaral
Marcos André Gonçalves
Raquel Oliveira Prates
Raquel C.M. Minardi
Jussara Marques de Almeida
JCDL English 2011 In this work, we present GreenWiki, which is a wiki with a panel of quality indicators to assist the reader of a Wikipedia article in assessing its quality. 4 0
Handling flammable materials: Wikipedia biographies of living persons as contentious objects Elisabeth Joyce
Brian Butler
Jacqueline Pike
IConference English 2011 0 0
Harvesting Wikipedia Knowledge to Identify Topics in Ongoing Natural Language Dialogs Alexa Breuing
Ulli Waltinger
Ipke Wachsmuth
WI-IAT English 2011 0 0
Hot off the Wiki: Dynamics, Practices, and Structures in Wikipedia’s Coverage of the Tōhoku Catastrophes Brian Keegan
Darren Gergle
Darren Contractor
WikiSym English 2011 Wikipedia editors are uniquely motivated to collaborate around current and breaking news events. However, the speed, urgency, and intensity with which these collaborations unfold also impose a substantial burden on editors’ abilities to effectively coordinate tasks and process information. We analyze the patterns of activity on Wikipedia following the 2011 Tōhoku earthquake and tsunami to understand the dynamics of editor attention and participation, novel practices employed to collaborate on these articles, and the resulting coauthorship structures which emerge between editors and articles. Our findings have implications for supporting future coverage of breaking news articles, theorizing about motivations to participate in online community, and illuminating Wikipedia’s potential role in storing cultural memories of catastrophe. 0 0
Identifying verbal collocations in wikipedia articles István Nagy T.
Veronika Vincze
TSD English 2011 0 0
Information Quality in Wikipedia: The Effects of Group Composition and Task Conflict Ofer Arazy
Oded Nov
Raymond Patterson
Lisa Yeo
J. Manage. Inf. Syst. English 2011 0 1
Interlinking journal and wiki publications through joint citation: Working examples from ZooKeys and Plazi on Species-ID Lyubomir Penev
Gregor Hagedorn
Daniel Mietchen
Teodor Georgiev
Pavel Stoev
Guido Sautter
Donat Agosti
Andreas Plank
Michael Balke
Lars Hendrich
Terry Erwin
ZooKeys English 2011 Scholarly publishing and citation practices have developed largely in the absence of versioned documents. The digital age requires new practices to combine the old and the new. We describe how the original published source and a versioned wiki page based on it can be reconciled and combined into a single citation reference. We illustrate the citation mechanism by way of practical examples focusing on journal and wiki publishing of taxon treatments. Specifically, we discuss mechanisms for permanent cross-linking between the static original publication and the dynamic, versioned wiki, as well as for automated export of journal content to the wiki, to reduce the workload on authors, for combining the journal and the wiki citation and for integrating it with the attribution of wiki contributors. 9 0
Introducing New Features to Wikipedia: Case Studies for Web Science Mathias Schindler
Denny Vrandeccic
IEEE Intelligent Systems English 2011 0 0
Language independent identification of parallel sentences using Wikipedia Rohit G. Bharadwaj
Vasudeva Varma
World Wide Web English 2011 0 0
Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge Li Cai
Guangyou Zhou
Kang Liu
Jun Zhao
CIKM English 2011 0 0
Lessons from the classroom: successful techniques for teaching wikis using Wikipedia Frank Schulenburg
LiAnna Davis
Max Klein
WikiSym English 2011 0 0
Leveraging Wikipedia concept and category information to enhance contextual advertising Zongda Wu
Guandong Xu
Rong Pan
Yanchun Zhang
Zhiwen Hu
Jianfeng Lu
CIKM English 2011 0 0
Link spamming Wikipedia for profit Andrew G. West
Jian Chang
Krishna Venkatasubramanian
Oleg Sokolsky
Insup Lee
CEAS English 2011 Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks.

Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize *exposure*, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement.

Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies.
0 0
Measuring Hyperlink Distances: Wikipedia Case Study Rodrigo Rodrigues Paim
Daniel Ratton Figueiredo
WebSci Conference English 2011 Hyperlinks are a fundamental aspect of the Web, as they play a major role in accomplishing important functions such as document clustering and document ranking. Despite various facets of hyperlink analysis, in this work we consider a novel aspect of hyperlinks, namely their distance. How far in terms of contextual similarity will a hyperlink take you? We consider classical distance functions that capture the similarity between documents as well as propose a new distance function, an IDF-based generalization of Jaccard distance. We characterize the distance distribution of hyperlinks considering Wikipedia as a case study. Our results indicate that hyperlink distances are strongly skewed, with the majority of hyperlinks exhibiting very long distances. 0 0
Measuring Semantic Relatedness Using Wikipedia Revision Information in a Signed Network Wen-Teng Yang
Hung-Yu Kao
TAAI English 2011 0 0
Mentoring in Wikipedia: a clash of cultures David R. Musicant
Yuqing Ren
James A. Johnson
John Riedl
WikiSym English 2011 0 0
Mobile wikipedia: a case study of information service design for chinese teenagers Jia Zhou
P. L. Patrick Rau
Christoph Rohmer
Jie Zhou
Christophe Ghalayini
Felix Roerig
UAHCI English 2011 0 0
Modelling Provenance of DBpedia Resources Using Wikipedia Contributions Fabrizio Orlandi
Alexandre Passant
Web Semantics: Science, Services and Agents on the World Wide Web English 2011 DBpedia is one of the largest datasets in the Linked Open Data cloud. Its centrality and its cross-domain nature makes it one of the most important and most referred to knowledge bases on the Web of Data, generally used as a reference for data interlinking.Yet, in spite of its authoritative aspect, there is no work so far tackling the provenance aspect of DBpedia statements. By being extracted from Wikipedia, an open and collaborative encyclopedia, delivering provenance information about it would help to ensure trustworthiness of its data, a major need for people using DBpedia data for building applications.To overcome this problem, we propose an approach for modelling and managing provenance on DBpedia using Wikipedia edits, and making this information available on the Web of Data. In this paper, we describe the framework that we implemented to do so, consisting in (1) a lightweight modelling solution to semantically represent provenance of both DBpedia resources and Wikipedia content, along with mappings to popular ontologies such as the W7 — what, when, where, how, who, which, and why — and OPM — Open Provenance Model — models, (2) an information extraction process and a provenance-computation system combining Wikipedia articles’ history with DBpedia information, (3) a set of scripts to make provenance information about DBpedia statements directly available when browsing this source, as well as being publicly exposed in RDF for letting software agents consume it. 0 0
Multilingual document clustering using wikipedia as external knowledge N. Kiran Kumar
K. G. S. Santosh
Vasudeva Varma
IRFC English 2011 0 0
Participation in Wikipedia's article deletion processes R. Stuart Geiger
Heather Ford
WikiSym English 2011 0 0
Places on the map and in the cloud: representations of locality and geography in Wikipedia Randall M. Livingstone WikiSym English 2011 0 0
Posibilidades de Wikipedia en la docencia universitaria: elaboración colaborativa de conocimiento Tomás Saorín-Pérez
María Verónica de Haro de San Mateo
Juan-Antonio Pastor-Sánchez
IBERSID Spanish 2011 A guide for Wikipedia student edition as a collaborative active learning activity is presented. Whereas the use of wikis in the classroom is widely documented, the educational possibilities of Wikipedia itself are not so much. We offer a classification of participatory activities suitable for being carried out by the students in the development of the curricular contents. One of the most relevant aspects is the transformation of the critical and distrustful speech towards the Wikipedia in a direct knowledge of its scope, process of production and systems of quality control. In addition, it is a good opportunity to improve a widespread source of information among university undergraduates that has a real impact and for the students to develop a more critical and active use of information sources. 0 0
Quality evaluation of wikipedia articles through edit history and editor groups Se Wang
Mizuho Iwaihara
APWeb English 2011 0 0
Reference Blindness: The Influence of References on Trust in Wikipedia Teun Lucassen
Matthijs L. Noordzij
Jan Maarten Schraagen
WebSci Conference English 2011 In this study we show the influence of references on trust in information. We changed the contents of reference lists of Wikipedia articles in such a way that the new references were no longer in any sense related to the topic of the article. Furthermore, the length of the reference list was varied. College students were asked to evaluate the credibility of these articles. Only 6 out of 23 students noticed the manipulation of the references; 9 out of 23 students noticed the variations in length. These numbers are remarkably low, as 17 students indicated they considered references an important indicator of credibility. The findings suggest a highly heuristic manner of credibility evaluation. Systematic evaluation behavior was also observed in the experiment, but only of participants with low trust in Wikipedia in general. 7 0
Semantic relatedness for named entity disambiguation using a small wikipedia Izaskun Fernandez
Iñaki Alegria
Nerea Ezeiza
TSD English 2011 0 0
Social capital increases efficiency of collaboration among Wikipedia editors Keiichi Nemoto
Peter Gloor
Robert Laubacher
HT English 2011 0 0
Social mechanism of granting trust basing on polish wikipedia requests for adminship Piotr Turek
Justyna Spychala
Adam Wierzbicki
Piotr Gackowski
SocInfo English 2011 0 0
Social networks of Wikipedia Paolo Massa Hypertext English 2011 Wikipedia, the free online encyclopedia anyone can edit, is a live social experiment: millions of individuals volunteer their knowledge and time to collective create it. It is hence interesting trying to understand how they do it. While most of the attention concentrated on article pages, a less known share of activities happen on user talk pages, Wikipedia pages where a message can be left for the specific user. This public conversations can be studied from a Social Network Analysis perspective in order to highlight the structure of the “talk” network. In this paper we focus on this preliminary extraction step by proposing different algorithms. We then empirically validate the differences in the networks they generate on the Venetian Wikipedia with the real network of conversations extracted manually by coding every message left on all user talk pages. The comparisons show that both the algorithms and the manual process contain inaccuracies that are intrinsic in the freedom and unpredictability of Wikipedia growth. Nevertheless, a precise description of the involved issues allows to make informed decisions and to base empirical findings on reproducible evidence. Our goal is to lay the foundation for a solid computational sociology of wikis. For this reason we release the scripts encoding our algorithms as open source and also some datasets extracted out of Wikipedia conversations, in order to let other researchers replicate and improve our initial effort. 14 2
Supporting Multilingual Discussion for Wikipedia Translation Noriyuki Ishida
Toshiyuki Takasaki
Masanobu Ishimatsu
Toru Ishida
CULTURE-COMPUTING English 2011 0 0
Text clustering based on granular computing and wikipedia Liping Jing
Jian Yu
RSKT English 2011 0 0
The Past, Present, and Future of Wikipedia Shyong (Tony) K. Lam
John Riedl
Computer English 2011 0 0
The correlation between Wikipedia and knowledge sharing on job performance Shu-Mei Tseng
Jiao-Sheng Huang
Expert Syst. Appl. English 2011 0 0
The nature of historical representation on Wikipedia: Dominant or alterative historiography? Brendan Luyt J. Am. Soc. Inf. Sci. Technol. English 2011 0 0
Towards Tailored Semantic Annotation Systems from Wikipedia Shahad Kudama
Rafael Berlanga Llavori
Lisette Garcia-Moya
Victoria Nebot
Maria Jose Aramburu Cabo
DEXA English 2011 0 0
Towards automatic quality assurance in Wikipedia Maik Anderka
Benno Stein
Nedim Lipka
20th International Conference on World Wide Web (WWW 11) English 2011 Featured articles in Wikipedia stand for high information quality, and it has been found interesting to researchers to analyze whether and how they can be distinguished from "ordinary" articles. Here we point out that article discrimination falls far short of writer support or automatic quality assurance: Featured articles are not identified, but are made. Following this motto we compile a comprehensive list of information quality flaws in Wikipedia, model them according to the latest state of the art, and devise one-class classification technology for their identification. 0 0
Towards identifying arguments in Wikipedia pages Hoda Sepehri Rad
Denilson Barbosa
World Wide Web English 2011 0 0
Towards improving wikipedia as an image-rich encyclopaedia through analyzing appropriateness of images for an article Xinpeng Zhang
Yasuhito Asano
Masatoshi Yoshikawa
APWeb English 2011 0 0
Understanding and improving Wikipedia article discussion spaces Jodi Schneider
Alexandre Passant
John G. Breslin
SAC English 2011 0 0
Using Wikipedia to boost collaborative filtering techniques Gilad Katz
Nir Ofek
Bracha Shapira
Lior Rokach
Guy Shani
RecSys English 2011 0 0
Utilizing DVD players as low-cost offline Internet browsers Gaurav Paruthi
William Thies
Proceedings of the 2011 annual conference on Human factors in computing systems English 2011 In the developing world, computers and Internet access remain rare. However, there are other devices that can be used to deliver information, including TVs and DVD players. In this paper, we work to bridge this gap by delivering offline Internet content on DVD, for interactive playback on ordinary DVD players. Using the remote control, users can accomplish all of the major functions available in a Web browser, including navigation, hyperlinks, and search. As our driving application, we map the entirety of schools-wikipedia.org - encompassing 5,500 articles and 259,000 screens - to a double-layer DVD. We evaluate our system via a study of 20 low-income users in Bangalore, India. Using our DVD as reference, participants are able to answer factual questions with over 90% success. While most participants prefer to use a computer if one is available, for resource-poor environments the DVD platform could represent a viable and low-cost alternative. 0 0
Vandalism detection in Wikipedia: a high-performing, feature-rich model and its reduction through Lasso Sara Javanmardi
David W. McDonald
Cristina V. Lopes
WikiSym English 2011 0 0
Verbete Digital: Análise de Gênero na Wikipedia Vanessa Wendhausen Lima Revista L@el em (Dis-)curso Portuguese 2011 The aim of this paper is to report an analysis of the entry of Wikipedia as a digital genre. This theoretical work is based on the theory of genre as social action, developed by Carolyn Miller (1994). The analysis of the rhetorical organization of the corpus shows that this genre is a variation of the genre entry which is found in traditional encyclopedias, involving also variations in the textual aspects. 0 0
Visualizing author contribution statistics in Wikis using an edit significance metric Peter Kin-Fong Fong
Robert P. Biuk-Aghai
WikiSym English 2011 Wiki articles tend to be edited multiple times by multiple authors. This makes it difficult to identify individual authors’ contributions by human observation alone. We calculate an edit significance metric, using different weights for different types of edits, which reflect the different value placed on them by wiki community members. We then aggregate edit significance values and present them as visualizations to the user to aid in perceiving extent and patterns of contributions. 0 0
WP:clubhouse?: an exploration of Wikipedia's gender imbalance Shyong (Tony) K. Lam
Anuradha Uduwage
Zhenhua Dong
Shilad Sen
David R. Musicant
Loren Terveen
John Riedl
WikiSym English 2011 0 0
What Wikipedia deletes: characterizing dangerous collaborative content Andrew G. West
Insup Lee
WikiSym English 2011 Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply ``undone -- but *deleted* from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied. 0 0
When the Wikipedians Talk: Network and Tree Structure of Wikipedia Discussion Pages David Laniado
Riccardo Tasso
Yana Volkovich
Andreas Kaltenbrunner
ICWSM English 2011 Talk pages play a fundamental role in Wikipedia as the place for discussion and communication. In this work we use the comments on these pages to extract and study three networks, corresponding to different kinds of interactions. We find evidence of a specific assortativity profile which differentiates article discussions from personal conversations. An analysis of the tree structure of the article talk pages allows to capture patterns of interaction, and reveals structural differences among the discussions about articles from different semantic areas. 0 2
WiGipedia: A Tool for Improving Structured Data in Wikipedia Svetlin Bostandjiev
John O'Donovan
Christopher Hall
Brynjar Gretarsson
Tobias Hollerer
ICSC English 2011 0 0
Wiki architectures as social translucence enablers Stephanie Gokhman
David W. McDonald
Mark Zachry
WikiSym English 2011 0 0
WikiLit: Collecting the Wiki and Wikipedia Literature Phoebe Ayers
Reid Priedhorsky
WikiSym English 2011 This workshop has three key goals. First, we will examine existing and proposed systems for collecting and analyzing the research literature about wikis. Second, we will discuss the challenges in building such a system and will engage participants to design a sustainable collaborative system to achieve this goal. Finally, we will provide a forum to build upon ongoing wiki community discussions about problems and opportunities in finding and sharing the wiki research literature. 1 0
WikiTrip: animated visualization over time of gender and geo-location of wikipedians who edited a page Paolo Massa
Maurizio Napolitano
Federico Scrinzi
English 2011 0 0
Wikipedia as a Data Source for Political Scientists: Accuracy and Completeness of Coverage Adam R. Brown PS: Political Science & Politics English 2011 In only 10 years, Wikipedia has risen from obscurity to become the dominant information source for an entire generation. However, any visitor can edit any page on Wikipedia, which hardly fosters confidence in its accuracy. In this article, I review thousands of Wikipedia articles about candidates, elections, and officeholders to assess both the accuracy and the thoroughness of Wikipedia's coverage. I find that Wikipedia is almost always accurate when a relevant article exists, but errors of omission are extremely frequent. These errors of omission follow a predictable pattern. Wikipedia's political coverage is often very good for recent or prominent topics but is lacking on older or more obscure topics. 0 1
Wikipedia based news video topic modeling for information extraction Sujoy Roy
Mun-Thye Mak
Kong Wah Wan
MMM English 2011 0 0
Wikipedia category visualization using radial layout Robert P. Biuk-Aghai
Felix Hon Hou Cheang
WikiSym English 2011 Wikipedia is a large and popular daily information source for millions of people. How are articles distributed by topic area, and what is the semantic coverage of Wikipedia? Using manual methods it is impractical to determine this. We present the design of an information visualization tool that produces overview diagrams of Wikipedia’s articles distributed according to category relationships, and show examples of visualizing English Wikipedia. 0 0
Wikipedia world map: method and application of map-like wiki visualization Cheong-Iao Pang
Robert P. Biuk-Aghai
WikiSym English 2011 Wiki are popular platforms for collaborative editing. In volunteer-driven wikis such as Wikipedia, which attracts millions of authors editing articles on a diverse range of topics, contributors’ editing activity results in certain semantic coverage of topic areas. Obtaining an understanding of a given wiki’s semantic coverage is not easy. To solve this problem, we have devised a method for visualizing a wiki in a way similar to a geographic map. We have applied our method to Wikipedia, and generated visualizations for several Wikipedia language editions. This paper presents our wiki visualization method and its application. 0 0
Wikipedia's "Neutral Point of View": Settling Conflict through Ambiguity Sorin Adam Matei
Caius Dobrescu
The Information Society English 2011 0 0
Wikipedia: Example for a future Electronic Democracy?: Decision, Discipline and Discourse in the Collaborative Encyclopaedia Sylvain Firer-Blaess Studies in Social and Political Thought English 2011 This article describes the mechanisms of a successful product of the Internet involving mass collaboration, namely, the online encyclopaedia Wikipedia.

In the first part of the paper, the author analyses the decision making process, including debates and consensus, which Wikipedia employs, and makes a connection with the Habermasian model of rational discourse. In the second part, he analyses the disciplines (in the Foucauldian sense) which underlie and permit this decision making process. He finds that, on the theoretical plane, despite the harsh criticisms Habermas claimed against the writings of Foucault, we can see a rather complementary relation between the establishing of rational discourse in Wikipedia and the effects of its discipline. In a third part, the author shows the resistances that face the decision-making process and the disciplines, and considers the reactions that have emerged against such resistances. These findings lead on to a discussion of the normativity of Foucauldian disciplines and the possibility of their heterogeneity.

Finally, the author examines the possible implementations of the Wikipedia system to electronic democracy projects.
0 0
"Wikipedias" y biblioteca pública. Participar en la información local digital a través de "localpedias" José-Antonio Gómez-Hernández Anuario ThinkEPI Spanish December 2010 This paper justifies participation by public libraries in designing and publishing in “localpedias” as a way to promote collaboration in the creation of local content. For this purpose, the “localpedia” concept is explained and some of the main Spanish localpedia experiences described. Finally, some difficulties in consolidating this way of creating and sharing local knowledge are discussed. 3 0
From Encyclopædia Britannica to Wikipedia: Generational differences in the perceived credibility of online encyclopedia information Andrew J. Flanagin
Miriam J. Metzger
Information, Communication & Society English 18 November 2010 This study examined the perceived credibility of user-generated (i.e. Wikipedia)

versus more expertly provided online encyclopedic information (i.e. Citizendium, and the online version of the Encyclopædia Britannica) across generations. Two large-scale surveys with embedded quasi-experiments were conducted: among 11 –18-year-olds living at home and among adults 18 years and older. Results showed that although use of Wikipedia is common, many people (particularly adults) do not truly comprehend how Wikipedia operates in terms of information provision, and that while people trust Wikipedia as an information source, they express doubt about the appropriateness of doing so. A companion quasi-experiment found that both children and adults assess information to be more credible when it originates or appears to originate from Encyclopædia Britannica. In addition, chil- dren rated information from Wikipedia to be less believable when they viewed it on Wikipedia’s site than when that same information appeared on either Citizendium’s site or on Encyclopædia Britannica’s site. Indeed, content originating from Wikipe- dia was perceived by children as least credible when it was shown on a Wikipedia page, yet the most credible when it was shown on the page of Encyclopædia Brit-

annica. The practical and theoretical implications of these results are discussed.
0 1
A Wikipédia e o discurso de/sobre o conhecimento Gláucia da Silva Henge IX Encontro do Círculo de Estudos Linguísticos do Sul Portuguese October 2010 1 0
As relações de poder entre editores da Wikipédia Paulo Henrique Souto Maior Serrano IX Encontro do Círculo de Estudos Linguísticos do Sul Portuguese October 2010 The collaborative encyclopedia Wikipedia is guided by several policies, recommendations and standards, developed by its community of users from five basic principles: 1) encyclopedist, 2) the neutral point of view, 3) free license, 4) how to conduct encrypted, 5) freedom in the rules (Wikipedia: 2009b). This article analyses through the greimasian´s and tensive semiotics the application of the five principles in the discussion of conflicting entries. 0 0
Fragmentação e wikificação: a morte de Zilda Arns na cobertura do G1 e da Wikipédia em português Carlos Frederico de Brito d’Andréa Anais do XXXIII Congresso Brasileiro de Ciências da Comunicação Portuguese September 2010 7 1
STiki: An Anti-Vandalism Tool for Wikipedia Using Spatio-Temporal Analysis of Revision Metadata Andrew G. West
Sampath Kannan
Insup Lee
WikiSym English July 2010 STiki is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki does not rely on natural language processing (NLP) over the article or diff text to locate vandalism. Instead, STiki leverages spatio-temporal properties of revision metadata. The feasibility of utilizing such properties was demonstrated in our prior work, which found they perform comparably to NLP-efforts while being more efficient, robust to evasion, and language independent. STiki is a real-time, on-Wikipedia implementation based on these properties. It consists of, (1) a server-side processing engine that examines revisions, scoring the likelihood each is vandalism, and, (2) a client-side GUI that presents likely vandalism to end-users for definitive classiffcation (and if necessary, reversion on Wikipedia). Our demonstration will provide an introduction to spatio-temporal properties, demonstrate the STiki software, and discuss alternative research uses for the open-source code. 0 0
Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples Lyubomir Penev
Donat Agosti
Teodor Georgiev
Terry Catapano
Jeremy Miller
Vladimir Blagoderov
David Roberts
Vincent Smith
Irina Brake
Simon Ryrcroft
Ben Scott
Norman Johnson
Robert Morris
Guido Sautter
Vishwas Chavan
Tim Robertson
David Remsen
Pavel Stoev
Cynthia Parr
Sandra Knapp
W. John Kress
Chris Thompson
Terry Erwin
ZooKeys English June 2010 The concept of semantic tagging and its potential for semantic enhancements to taxonomic papers is outlined and illustrated by four exemplar papers published in the present issue of ZooKeys. The four papers were created in different ways: (i) written in Microsoft Word and submitted as non-tagged manuscript (doi: 10.3897/zookeys.50.504); (ii) generated from Scratchpads and submitted as XML-tagged manuscripts (doi: 10.3897/zookeys.50.505 and doi: 10.3897/zookeys.50.506); (iii) generated from an author’s database (doi: 10.3897/zookeys.50.485) and submitted as XML-tagged manuscript. XML tagging and semantic enhancements were implemented during the editorial process of ZooKeys using the Pensoft Mark Up Tool (PMT), specially designed for this purpose. The XML schema used was TaxPub, an extension to the Document Type Definitions (DTD) of the US National Library of Medicine Journal Archiving and Interchange Tag Suite (NLM). The following innovative methods of tagging, layout, publishing and disseminating the content were tested and implemented within the ZooKeys editorial workflow: (1) highly automated, fine-grained XML tagging based on TaxPub; (2) final XML output of the paper validated against the NLM DTD for archiving in PubMedCentral; (3) bibliographic metadata embedded in the PDF through XMP (Extensible Metadata Platform); (4) PDF uploaded after publication to the Biodiversity Heritage Library (BHL); (5) taxon treatments supplied through XML to Plazi; (6) semantically enhanced HTML version of the paper encompassing numerous internal and external links and linkouts, such as: (i) vizualisation of main tag elements within the text (e.g., taxon names, taxon treatments, localities, etc.); (ii) internal cross-linking between paper sections, citations, references, tables, and figures; (iii) mapping of localities listed in the whole paper or within separate taxon treatments; (v) taxon names autotagged, dynamically mapped and linked through the Pensoft Taxon Profile (PTP) to large international database services and indexers such as Global Biodiversity Information Facility (GBIF), National Center for Biotechnology Information (NCBI), Barcode of Life (BOLD), Encyclopedia of Life (EOL), ZooBank, Wikipedia, Wikispecies, Wikimedia, and others; (vi) GenBank accession numbers autotagged and linked to NCBI; (vii) external links of taxon names to references in PubMed, Google Scholar, Biodiversity Heritage Library and other sources. With the launching of the working example, ZooKeys becomes the first taxonomic journal to provide a complete XML-based editorial, publication and dissemination workflow implemented as a routine and cost-efficient practice. It is anticipated that XML-based workflow will also soon be implemented in botany through PhytoKeys, a forthcoming partner journal of ZooKeys. The semantic markup and enhancements are expected to greatly extend and accelerate the way taxonomic information is published, disseminated and used. 0 1
Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata Andrew G. West
Sampath Kannan
Insup Lee
EUROSEC English April 2010 Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with nonoffending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set. 9 3
A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia Daryl Woodward
Jeremy Witmer
Jugal Kalita
ICSC English 2010 0 0
A Cultural and Political Economy of Web 2.0 Robert W. Gehl English 2010 In this dissertation, I explore Web 2.0, an umbrella term for Web-based software and services such as blogs, wikis, social networking, and media sharing sites. This range of Web sites is complex, but is tied together by one key feature: the users of these sites and services are expected to produce the content included in them. That is, users write and comment upon blogs, produce the material in wikis, make connections with one another in social networks, and produce videos in media sharing sites. This has two implications. First, the increase of user-led media production has led to proclamations that mass media, hierarchy, and authority are dead, and that we are entering into a time of democratic media production. Second, this mode of media production relies on users to supply what was traditionally paid labor. To illuminate this, I explore the popular media discourses which have defined Web 2.0 as a progressive, democratic development in media production. I consider the pleasures that users derive from these sites. I then examine the technical structure of Web 2.0. Despite the arguments that present Web 2.0 as a mass appropriation of the means of media production, I have found that Web 2.0 site owners have been able to exploit users' desires to create content and control media production. Site owners do this by deploying a dichotomous structure. In a typical Web 2.0 site, there is a surface, where users are free to produce content and make affective connections, and there is a hidden depth, where new media capitalists convert user-generated content into exchange-values. Web 2.0 sites seek to hide exploitation of free user labor by limiting access to this depth. This dichotomous structure is made clearer if it is compared to the one Web 2.0 site where users have largely taken control of the products of their labor: Wikipedia. Unlike many other sites, Wikipedia allows users to see into and determine the legal, technical, and cultural depths of that site. I conclude by pointing to the different cultural formations made possible by eliminating the barrier between surface and depth in Web software architecture. 13 0
A Framework for Co-classification of Articles and Users in Wikipedia Lei Liu
Pang-Ning Tan
WI-IAT English 2010 0 0
A Statistical Approach to the Impact of Featured Articles in Wikipedia Antonio J. Reinoso
Felipe Ortega
Jesús M. González-Barahona
Israel Herraiz
KEOD English 2010 This paper presents an empirical study on the impact of featured articles on the attention that Wikipedia’s articles attract, and how this behavior differs in different editions of Wikipedia. The study is based on the analysis of the log lines registered by the Wikimedia Foundation Squid servers after having sent the appropriate content in response to the corresponding request submitted by any Wikipedia user. The analysis has been conducted regarding the six most visited editions of the Wikipedia and has involved more than 4,100 million log lines corresponding to the traffic of September, October and November 2009. The methodology of work has mainly consisted on the parsing of the requests sent by the users and on their subsequent filtering according to the study directives. Relevant information fields has been finally stored in a database for persistence and further characterization. The main results of this paper are twofold: it shows how to use the the traffic log to extract information about the use of Wikipedia, which is a novel research approach without precedences in the research community, and it analyzes whether the featured articles mechanism achieve to attract more attention or not. 6 0
Adhocratic Governance in the Internet Age: A Case of Wikipedia Piotr Konieczny English 2010 In recent years, a new realm has appeared for the study of political and sociological phenomena: the Internet. This article will analyze the decision-making processes of one of the largest online communities, Wikipedia. Founded in 2001, Wikipedianow among the top-10 most popular sites on the Internethas succeeded in attracting and organizing millions of volunteers and creating the world's largest encyclopedia. To date, however, little study has been done of Wikipedia's governance. There is substantial confusion about its decision-making structure. The organization's governance has been compared to many decision-making and political systemsfrom democracy to dictatorship, from bureaucracy to anarchy. It is the purpose of this article to go beyond the earlier simplistic descriptions of Wikipedia's governance in order to advance the study of online governance, and of organizations more generally. As the evidence will show, while Wikipedia's governance shows elements common to many traditional governance models, it appears to be closest to the organizational structure known as adhocracy. 0 1
An Efficient Method for Tagging a Query with Category Labels Using Wikipedia towards Enhancing Search Engine Results Milad Alemzadeh
Fakhri Karray
WI-IAT English 2010 0 0
Annotate Wikipedia with Flickr images: concepts and case study Jie Xiao
Qi Tian
ICIMCS English 2010 0 0
Applying wikipedia-based explicit semantic analysis for query-biased document summarization Yunqing Zhou
Zhongqi Guo
Peng Ren
Yong Yu
ICIC English 2010 0 0
Auto-organização e processos editoriais na Wikipédia: uma análise à luz de Michel Debrun Carlos Frederico de Brito d’Andréa Leitura e escrita em movimento Portuguese 2010 0 1
Beyond Wikipedia: Coordination and Conflict in Online Production Groups Aniket Kittur
Robert E. Kraut
Computer-Supported Cooperative Work English 2010 Online production groups have the potential to transform the way that knowledge is produced and disseminated. One of the most widely used forms of online production is the wiki, which has been used in domains ranging from science to education to enterprise. We examined the development of and interactions between coordination and conflict in a sample of 6811 wiki production groups. We investigated the influence of four coordination mechanisms: intra-article communication, inter-user communication, concentration of workgroup structure, and policy and procedures. We also examined the growth of conflict, finding the density of users in an information space to be a significant predictor. Finally, we analyzed the effectiveness of the four coordination mechanisms on managing conflict, finding differences in how each scaled to large numbers of contributors. Our results suggest that coordination mechanisms effective for managing conflict are not always the same as those effective for managing task quality, and that designers must take into account the social benefits of coordination mechanisms in addition to their production benefits. 0 2
Building Bilingual Parallel Corpora Based on Wikipedia Mehdi Mohammadi
Nasser GhasemAghaee
ICCEA English 2010 0 1
Centroid-based Classification Enhanced with Wikipedia Abdullah Bawakid
Mourad Oussalah
ICMLA English 2010 0 0
Coisas velhas em coisas novas: novas “velhas tecnologias” Pedro Demo Ciência da Informação Portuguese January 2010 The objective of this article is to present an up-to-date discussion about the extraordinary technological innovations, mainly the new technologies underlining both breaches and continuities. Technologies are supposed to present a sense of convergence as well as continuities. Hackers and others who propose free software are in favor of liberty and liberation, considering computer and internet as arenas of freedom. This is only partly correct, because these hackers who consider themselves as libertarians submit themselves to narrow-minded structures of power (for example, autocratic bosses). Internet is state-wide instead of being worldwide. France has imposed changes in the contents of sites. China does not allow a free flow of information. That aura of beginning liberty, granted as a structure of the computer for being customized and formatted is strongly contested by illegal and immoral flow, by introduction of spasm and marketing, as well as by virus contamination. The so called "generative internet" is loosing ground on account of the pressure of users who want guaranteed end-products, easier to be handled, for avoiding abuse of freedom. The case of Wikipédia is remarkable. Continuous wars of publishing unsettle the environment (although this does not hinder the production of a large and original encyclopedia). 13 0
Cross-cultural analysis of the Wikipedia community Noriko Hara
Pnina Shachaf
Khe Foon Hew
J. Am. Soc. Inf. Sci. Technol. English 2010 0 2
Crowdsourcing a Wikipedia Vandalism Corpus Martin Potthast SIGIR English 2010 We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon’s Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as “regular” or “vandalism.” The corpus is available free of charge. 6 1
Crowdsourcing and Open Access: Collaborative Techniques for Disseminating Legal Materials and Scholarship Timothy K. Armstrong Santa Clara Computer and High Technology Law Journal English 2010 This short essay surveys the state of open access to primary legal source materials (statutes, judicial opinions and the like) and legal scholarship. The ongoing digitization phenomenon (illustrated, although by no means typified, by massive scanning endeavors such as the Google Books project and the Library of Congress's efforts to digitize United States historical documents) has made a wealth of information, including legal information, freely available online, and a number of open-access collections of legal source materials have been created. Many of these collections, however, suffer from similar flaws: they devote too much effort to collecting case law rather than other authorities, they overemphasize recent works (especially those originally created in digital form), they do not adequately hyperlink between related documents in the collection, their citator functions are haphazard and rudimentary, and they do not enable easy user authentication against official reference sources. The essay explores whether some of these problems might be alleviated by enlarging the pool of contributors who are working to bring paper records into the digital era. The same "peer production" process that has allowed far-flung communities of volunteers to build large-scale informational goods like the Wikipedia encyclopedia or the Linux operating system might be harnessed to build a digital library. The essay critically reviews two projects that have sought to "crowdsource" proofreading and archiving of texts: Distributed Proofreaders, a project frequently held up as a model in the academic literature on peer production; and Wikisource, a sister site of Wikipedia that improves on Distributed Proofreaders in a number of ways. The essay concludes by offering a few illustrations meant to show the potential for using Wikisource as an open-access repository for primary source materials and scholarship, and considers some possible drawbacks of the crowdsourced approach. 4 1
Deriving a categorical vector space model for web page recommendations based on Wikipedia's content Pei-Chia Chang
Luz M. Quiroga
ASIS\&T English 2010 0 0
Do Wikipedians follow domain experts?: a domain-specific study on Wikipedia knowledge building Yi Zhang
Aixin Sun
Anwitaman Datta
Kuiyu Chang
Ee-Peng Lim
JCDL English 2010 0 0
Educational Tool Based on Topology and Evolution of Hyperlinks in the Wikipedia Lauri Lahti ICALT English 2010 0 0
Efficient wikipedia-based semantic interpreter by exploiting top-k processing Jong Wook Kim
Ashwin Kashyap
Dekai Li
Sandilya Bhamidipati
CIKM English 2010 0 0
Ensino de línguas e produção de texto: editando wikis e a Wikipédia Ana Elisa Ferreira Ribeiro
Carlos Frederico de Brito d’Andréa
Línguas na Web: links entre ensino e aprendizagem Portuguese 2010 0 0
Entity classification by bag of Wikipedia articles PIKM English 2010 0 0
Entity ranking using Wikipedia as a pivot Rianne Kaptein
Pavel Serdyukov
Arjen De Vries
Jaap Kamps
CIKM English 2010 0 0
Entity-relationship queries over wikipedia Xiaonan Li
Chengkai Li
Cong Yu
SMUC English 2010 0 0
Escalada do conflito em processos colaborativos online: uma análise do verbete Web 2.0 da Wikipédia Aline de Campos Intexto Portuguese January 2010 Collaborative actions are naturally interlocking with conflict processes. In other words, collaboration can lead to conflicts and vice versa. This article discusses the implications of this mutual influence, checking the dynamics of conflict escalation often present in online collaborative practices. To do so, it examines the trajectory of building an entry from Wikipedia, the Free Encyclopedia, evaluating the tensions introduced in the pages of debate over the collective and computer-mediated production. 0 0
Extracting the gist of social network services using Wikipedia Akiyo Nadamoto
Eiji Aramaki
Takeshi Abekawa
Yohei Murakami
IiWAS English 2010 0 0
Facetedpedia: enabling query-dependent faceted search for wikipedia Ning Yan
Chengkai Li
Senjuti B. Roy
Rakesh Ramegowda
Gautam Das
CIKM English 2010 0 0
Frequent itemset based hierarchical document clustering using Wikipedia as external knowledge G. V. R. Kiran
Ravi Shankar
Vikram Pudi
KES English 2010 0 0
Identifying featured articles in wikipedia: writing style matters Nedim Lipka
Benno Stein
World Wide Web English 2010 0 0
Improving Human-Agent Conversations by Accessing Contextual Knowledge from Wikipedia Alexa Breuing WI-IAT English 2010 0 0
Improving Question Answering Based on Query Expansion with Wikipedia Yajie Miao
Xin Su
Chunping Li
ICTAI English 2010 0 0
Improving Wikipedia's credibility: References and citations in a sample of history articles Brendan Luyt
Daniel Tan
J. Am. Soc. Inf. Sci. Technol. English 2010 This study evaluates how well the authors of Wikipedia history articles adhere to the site’s policy of assuring verifiability through citations. It does so by examining the references and citations of a subset of country histories. The findings paint a dismal picture. Not only are many claims not verified through citations, those that are suffer from the choice of references used. Many of these are from only a few US government Websites or news media and few are to academic journal material. Given these results, one response would be to declare Wikipedia unsuitable for serious reference work. But another option emerges when we jettison technological determinism and look at Wikipedia as a product of a wider social context. Key to this context is a world in which information is bottled up as commodities requiring payment for access. Equally important is the problematic assumption that texts are undifferentiated bearers of knowledge. Those involved in instructional programs can draw attention to the social nature of texts to counter these assumptions and by so doing create an awareness for a new generation of Wikipedians and Wikipedia users of the need to evaluate texts (and hence citations) in light of the social context of their production and use. 11 1
Learning about team collaboration from Wikipedia edit history Adam Wierzbicki
Piotr Turek
Radoslaw Nielek
WikiSym English 2010 0 0
Linking topics of news and blogs with wikipedia for complementary navigation Yuki Sato
Daisuke Yokomoto
Hiroyuki Nakasaki
Mariko Kawaba
Takehito Utsuro
Tomohiro Fukuhara
English 2010 0 0
MENTA: inducing multilingual taxonomies from wikipedia Gerard de Melo
Gerhard Weikum
CIKM English 2010 0 0
Mining the Factors Affecting the Quality of Wikipedia Articles Kewen Wu
Qinghua Zhu
Yuxiang Zhao
Hua Zheng
ISME English 2010 0 0
Mining wikipedia and yahoo! answers for question expansion in opinion QA Yajie Miao
Chunping Li
PAKDD English 2010 0 0
Modeling user reputation in wikis Sara Javanmardi
Cristina Lopes
Pierre Baldi
Stat. Anal. Data Min. English 2010 0 2
Multilingual sentence alignment from Wikipedia as multilingual comparable corpora Min-Hsiang Li
Vitaly Klyuev
Shih-Hung Wu
HC English 2010 0 0
O papel do sujeito em uma enciclopédia online Gláucia da Silva Henge Organon Portuguese 2010 This text seeks to review the notion of subject from the perspective of discourse analysis to investigate the established designation for those who edit the online encyclopedia Wikipedia. For this, other important notions are rescued, such as: image, discoursive place, meaning and memory. Going through the formulations components of the encyclopedia, we can see the slides of meaning and the game of powers between the speeches, been emerging in the subjectivity of speech through an idealization. The Wikipedian would be, then, the correspondent at the plan of designation of an imaginary construction of the figure of an internet user as the one who applies all web resources and who is free to express himself, exchange with other internet users and who is an encyclopedist inside the virtual universe. 0 0
Overview of the INEX 2009 link the wiki track Wei Che Huang
Shlomo Geva
Andrew Trotman
INEX English 2010 0 0
Proposal of Spatiotemporal Data Extraction and Visualization System Based on Wikipedia for Application to Earth Science Akihiro Okamoto
Shohei Yokoyama
Naoki Fukuta
Hiroshi Ishikawa
ICIS English 2010 0 0
Rating the raters: a reputation system for wiki-like domains Alexis Velarde Pantola
Susan Pancho-Festin
Florante Salvador
SIN English 2010 0 0
Readers are not free-riders: reading as a form of participation on Wikipedia Judd Antin
Coye Cheshire
Computer-Supported Cooperative Work English 2010 The success of Wikipedia as a large-scale collaborative effort has spurred researchers to examine the motivations and behaviors of Wikipedia's participants. However, this research has tended to focus on active involvement rather than more common forms of participation such as reading. In this paper we argue that Wikipedia's readers should not all be characterized as free-riders -- individuals who knowingly choose to take advantage of others' effort. Furthermore, we illustrate how readers provide a valuable service to Wikipedia. Finally, we use the notion of legitimate peripheral participation to argue that reading is a gateway activity through which newcomers learn about Wikipedia. We find support for our arguments in the results of a survey of Wikipedia usage and knowledge. Implications for future research and design are discussed. 0 3
Social Network Mining Based on Wikipedia Fangfang Yang
Zhiming Xu
Sheng Li
Zhikai Xu
IALP English 2010 0 0
Socialization tactics in Wikipedia and their effects Boreum Choi
Kira Alexander
Robert E. Kraut
John M. Levine
English 2010 Socialization of newcomers is critical both for conventional groups. It helps groups perform effectively and the newcomers develop commitment. However, little empirical research has investigated the impact of specific socialization tactics on newcomers' commitment to online groups. We examined WikiProjects, subgroups in Wikipedia organized around working on common topics or tasks. In study 1, we identified the seven socialization tactics used most frequently: invitations to join, welcome messages, requests to work on project-related tasks, offers of assistance, positive feedback on a new member's work, constructive criticism, and personal-related comments. In study 2, we examined their impact on newcomers' commitment to the project. Whereas most newcomers contributed fewer edits over time, the declines were slowed or reversed for those socialized with welcome messages, assistance, and constructive criticism. In contrast, invitations led to steeper declines in edits. These results suggest that different socialization tactics play different roles in socializing new members in online groups compared to offline ones. 0 1
Statistical measure of quality in Wikipedia Sara Javanmardi
Cristina Lopes
SOMA English 2010 0 1
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) Paolo Ferragina
Ugo Scaiella
CIKM English 2010 0 0
Tag disambiguation through Flickr and Wikipedia Anastasia Stampouli
Eirini Giannakidou
Athena Vakali
DASFAA English 2010 0 0
Teaching with Wikipedia and other Wikimedia foundation wikis Piotr Konieczny WikiSym English 2010 0 0
Testing an integrative theoretical model of knowledge-sharing behavior in the context of Wikipedia Hichang Cho
MeiHui Chen
Siyoung Chung
J. Am. Soc. Inf. Sci. Technol. English 2010 0 0
The work of sustaining order in Wikipedia: the banning of a vandal R. Stuart Geiger
David Ribes
English 2010 In this paper, we examine the social roles of software tools in the English-language Wikipedia, specifically focusing on autonomous editing programs and assisted editing tools. This qualitative research builds on recent research in which we quantitatively demonstrate the growing prevalence of such software in recent years. Using trace ethnography, we show how these often-unofficial technologies have fundamentally transformed the nature of editing and administration in Wikipedia. Specifically, we analyze "vandal fighting" as an epistemic process of distributed cognition, highlighting the role of non-human actors in enabling a decentralized activity of collective intelligence. In all, this case shows that software programs are used for more than enforcing policies and standards. These tools enable coordinated yet decentralized action, independent of the specific norms currently in force. 0 2
Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia Yafang Wang
Mingjie Zhu
Lizhen Qu
Marc Spaniol
Gerhard Weikum
EDBT English 2010 0 0
Ubiquitous Wikipedia on Handheld Device for Mobile Learning Shih-Hung Wu
Min-Xiang Li
Ping-che Yang
Tsun Ku
WMUTE English 2010 0 0
What cognition does for Wikis Rut Jesus WikiSym English 2010 0 0
WikiPics: multilingual image search based on Wiki-mining Daniel Kinzler WikiSym English 2010 0 0
WikiPop - Personalized Event Detection System Based on Wikipedia Page View Statistics Marek Ciglan
Kjetil Nørvåg
English 2010 In this paper, we describe WikiPop, a system designed to detect significant increase of popularity of topics related to users' interests. We exploit Wikipedia page view statistics to identify concepts with significant increase of the interest from the public. Daily, there are thousands of articles with increased popularity; thus, a personalization is in order to provide the user only with results re- lated to his/her interest. TheWikiPop system allows a user to define a context by stating a set of Wikipedia articles describing topics of interest. The system is then able to search, for the given date, for popular topics related to the user defined context. 0 0
Wikipedia-based semantic smoothing for the language modeling approach to information retrieval Xinhui Tu
Tingting He
Long Chen
Jing Luo
Maoyuan Zhang
ECIR English 2010 0 0
\&\ Joseph M. Reagle
Jr.
New Rev. Hypermedia Multimedia English 2010 0 0
Textual curators and writing machines: authorial agency in encyclopedias, print to digital Krista A. Kennedy English July 2009 Wikipedia is often discussed as the first of its kind: the first massively collaborative, Web-based encyclopedia that belongs to the public domain. While it’s true that wiki technology enables large-scale, distributed collaborations in revolutionary ways, the concept of a collaborative encyclopedia is not new, and neither is the idea that private ownership might not apply to such documents. More than 275 years ago, in the preface to the 1728 edition of his Cyclopædia, Ephraim Chambers mused on the intensely collaborative nature of the volumes he was about to publish. His thoughts were remarkably similar to contemporary intellectual property arguments for Wikipedia, and while the composition processes involved in producing these texts are influenced by the available technologies, they are also unexpectedly similar. This dissertation examines issues of authorial agency in these two texts and shows that the “Author Construct” is not static across eras, genres, or textual technologies. In contrast to traditional considerations of the poetic author, the encyclopedic author demonstrates a different form of authorial agency that operates within strict genre conventions and does not place a premium on originality. This and related variations challenge contemporary ideas concerning the divide between print and digital authorship as well as the notion that new media intellectual property arguments are without historical precedent. 25 0
Fixing the floating gap: The online encyclopaedia Wikipedia as a global memory place Christian Pentzold Memory Studies English May 2009 The article proposes to interpret the web-based encyclopaedia Wikipedia as a global memory place. After presenting the core elements and basic characteristics of wikis and Wikipedia respectively, the article discusses four related issues of social memory studies: collective memory, communicative and cultural memory, `memory places' and the `floating gap'. In a third step, these theoretical premises are connected to the understanding of discourse as social cognition. Fourth, comparison is made between the potential of the World Wide Web as cyberspace for collective remembrance and the obstacles that stand in its way. On this basis, the article argues that Wikipedia presents a global memory place where memorable elements are negotiated. Its complex processes of discussion and article creation are a model of the discursive fabrication of memory. Thus, they can be viewed and analysed as the transition, the `floating gap' between communicative and collective frames of memory. 6 2
Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a Wikipédia Daniel Hasan Dalip Portuguese April 2009 The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract on a open digital library, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solutions and show significant improvements in terms of effective quality prediction. 11 0
What’s on Wikipedia and What’s Not... ? Cindy Royal
Deepina Kapila
Social Science Computer Review English February 2009 The World Wide Web continues to grow closer to achieving the vision of becoming the repository of all human knowledge, as features and applications that support user-generated content become more prevalent. Wikipedia is fast becoming an important resource for news and information. It is an online information source that is increasingly used as the first, and sometimes only, stop for online encyclopedic information. Using a method employed by Tankard and Royal to judge completeness of Web content, completeness of information on Wikipedia is assessed. Some topics are covered more comprehensively than others, and the predictors of these biases include recency, importance, population, and financial wealth. Wikipedia is more a socially produced document than a value-free information source. It reflects the viewpoints, interests, and emphases of the people who use it. 0 1
"All You Can Eat" Ontology-Building: Feeding Wikipedia to Cyc Samuel Sarjant
Catherine Legg
Michael Robinson
Olena Medelyan
WI-IAT English 2009 In order to achieve genuine web intelligence, building some kind of large general machine-readable conceptual scheme (i.e. ontology) seems inescapable. Yet the past 20 years have shown that manual ontology-building is not practicable. The recent explosion of free user-supplied knowledge on the Web has led to great strides in automatic ontology-building, but quality-control is still a major issue. Ideally one should automatically build onto an already intelligent base. We suggest that the long-running Cyc project is able to assist here. We describe methods used to add 35K new concepts mined from Wikipedia to collections in ResearchCyc entirely automatically. Evaluation with 22 human subjects shows high precision both for the new concepts’ categorization, and their assignment as individuals or collections. Most importantly we show how Cyc itself can be leveraged for ontological quality control by ‘feeding’ it assertions one by one, enabling it to reject those that contradict its other knowledge. 0 0
"edit this page": the socio-technological infrastructure of a wikipedia article Shaun P. Slattery SIGDOC English 2009 0 0
A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia Aaron Halfaker
Aniket Kittur
Robert E. Kraut
John Riedl
WikiSym English 2009 Wikipedia is a highly successful example of what mass collaboration in an informal peer review system can accomplish. In this paper, we examine the role that the quality of the contributions, the experience of the contributors and the ownership of the content play in the decisions over which contributions become part of Wikipedia and which ones are rejected by the community. We introduce and justify a versatile metric for automatically measuring the quality of a contribution. We find little evidence that experience helps contributors avoid rejection. In fact, as they gain experience, contributors are even more likely to have their work rejected. We also find strong evidence of ownership behaviors in practice despite the fact that ownership of content is discouraged within Wikipedia. 0 3
A aprovação de sentidos enunciados na "Wikipédia: a enciclopédia livre" Paulo Henrique Souto Maior Serrano Anais do SILEL Portuguese 2009 2 0
A graph-based approach to mining multilingual word associations from wikipedia Zheng Ye
Xiangji Huang
Hongfei Lin
SIGIR English 2009 0 0
An interactive semantic knowledge base unifying Wikipedia and HowNet Hongzhi Guo
Qingcai Chen
Lei Cui
Xiaolong Wang
ICICS English 2009 0 0
Annotating wikipedia articles with semantic tags for structured retrieval Saravadee Sae Tan
Tang Enya Kong
Gian Chand Sodhy
SWSM English 2009 0 0
Articles as Assignments --- Modalities and Experiences of Wikipedia Use in University Courses Klaus Wannemacher ICWL English 2009 0 0
Auto-organização e processos editoriais na Wikipedia: uma análise à luz de Michel Debrun Carlos Frederico de Brito d’Andréa Anais Hipertexto 2009 Portuguese 2009 8 0
Bipartite networks of Wikipedia's articles and authors: a meso-level approach Rut Jesus
Martin Schwartz
Sune Lehmann
WikiSym English 2009 This exploratory study investigates the bipartite network of articles linked by common editors in Wikipedia, 'The Free Encyclopedia that Anyone Can Edit'. We use the articles in the categories (to depth three) of Physics and Philosophy and extract and focus on significant editors (at least 7 or 10 edits per each article). We construct a bipartite network, and from it, overlapping cliques of densely connected articles and editors. We cluster these densely connected cliques into larger modules to study examples of larger groups that display how volunteer editors flock around articles driven by interest, real-world controversies, or the result of coordination in WikiProjects. Our results confirm that topics aggregate editors; and show that highly coordinated efforts result in dense clusters. 0 1
… further results
Personal tools
Namespaces
Variants
Views
Actions
Navigation
Create new...
Activity
Data export
Toolbox