Felipe Ortega

From WikiPapers
Jump to: navigation, search

Felipe Ortega is an author from Spain.

Tools

Tool Description
WikiXRay WikiXRay is a robust and extensible software tool for an in-depth quantitative analysis of the whole Wikipedia project.
Wikipedia Data Analysis Toolkit

{{#widget:Vimeo|id=8758454}} [[Category:Widget]]


Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Is Wikipedia inefficient? Modelling effort and participation in Wikipedia Proceedings of the Annual Hawaii International Conference on System Sciences English 2013 Concerns have been raised about the decreased ability of Wikipedia to recruit editors and in to harness the effort of contributors to create new articles and improve existing articles. But, as [1], [2] explained, in collective projects, in the initial stage of the project, people are few and efforts costly; in the diffusion phase, the number of participants grows as their efforts are rewarding; and in the mature phase, some inefficiency may appear as the number of contributors is more than the work requires. In this paper, thanks to original data we extract from 36 of the main language projects, we compare the efficiency of Wikipedia projects in different languages and at different states of development to examine this effect. 0 1
Sustainability of Open Collaborative Communities: Analyzing Recruitment Efficiency DEA modeling
Efficiency
Recruitment
Wikipedia
Technology Innovation Management Review January 2013 0 0
Is Wikipedia Inefficient? Modelling Effort and Participation in Wikipedia Data Envelopment Analysis
Efficiency
Wikipedia
HICSS 2013 English 17 November 2011 Concerns have been raised about the decreased ability of Wikipedia to recruit editors and in to harness the effort of contributors to create new articles and imp 22 1
El potlatch digital. Wikipedia y el triunfo del procomún y el conocimiento compartido Spanish 2011 En el año 1968, Garret Hardin publicó en la revis­ta «Science» un artículo determinante, «The Trage­dy of the Commons», en el que reflexionaba sobre la dificultad de la gestión de los bienes y los recursos comunes y sobre el peligro al que estaba expuesta su subsistencia. La Premio Nobel de Economía Elinor Ostrom pasaría la mayor parte de su vida profesional investigando, precisamente, sobre los mecanismos de la acción colectiva y la gestión solidaria del proco­mún, intentando inferir de las buenas prácticas al­gunas características estructurales comunes. Con la in­vención de Internet y la digitalización del conoci­miento, resurge con vigor en versión digital el pro­blema analógico precedente: ¿cómo pueden surgir y autogestionarse comunidades online cuyo propósito es la generación de conocimiento compartido? Es decir, ¿cómo puede y debe gestionarse el procomún digital, el «digital commons»? Wikipedia ofrece un ejem­plo prototípico y floreciente de la construcción de una comunidad que consensúa sus políticas, esta­blece sus mecanismos internos de reconocimiento y orga­niza sus dispositivos de control y vigilancia, todo sin que circule efectivo de ninguna clase. El caso del «potlatch» canadiense nos sirve para comprender cómo en determinados contextos y circunstancias es nece­sario desprenderse del capital que se posee para que la comunidad lo devuelva y lo reintegre en forma de reconocimiento y renombre; cómo en determinados contextos culturales, la especie de capital que circula no es monetaria, sino simbólica, en forma de repu­tación y popularidad, y la lógica de su acumulación exige ser desinteresado para generar otra forma de interés. Así funcionan algunos de los casos más co­nocidos de Internet y así se ha convertido la Wikipe­dia en un caso del triunfo de la gestión del procomún y el conocimiento compartido. 0 0
A Statistical Approach to the Impact of Featured Articles in Wikipedia Wikipedia
Usage patterns
Traffic characterization
Quantitative analysis
KEOD English 2010 This paper presents an empirical study on the impact of featured articles on the attention that Wikipedia’s articles attract, and how this behavior differs in different editions of Wikipedia. The study is based on the analysis of the log lines registered by the Wikimedia Foundation Squid servers after having sent the appropriate content in response to the corresponding request submitted by any Wikipedia user. The analysis has been conducted regarding the six most visited editions of the Wikipedia and has involved more than 4,100 million log lines corresponding to the traffic of September, October and November 2009. The methodology of work has mainly consisted on the parsing of the requests sent by the users and on their subsequent filtering according to the study directives. Relevant information fields has been finally stored in a database for persistence and further characterization. The main results of this paper are twofold: it shows how to use the the traffic log to extract information about the use of Wikipedia, which is a novel research approach without precedences in the research community, and it analyzes whether the featured articles mechanism achieve to attract more attention or not. 6 0
A statistical approach to the impact of featured articles in Wikipedia Quantitative analysis
Traffic characterization
Usage patterns
Wikipedia
KEOD 2010 - Proceedings of the International Conference on Knowledge Engineering and Ontology Development English 2010 This paper presents an empirical study on the impact of featured articles on the attention that Wikipedia's articles attract, and how this behavior differs in different editions of Wikipedia. The study is based on the analysis of the log lines registered by the Wikimedia Foundation Squid servers after having sent the appropriate content in response to the corresponding request submitted by any Wikipedia user. The analysis has been conducted regarding the six most visited editions of the Wikipedia and has involved more than 4,100 million log lines corresponding to the traffic of September, October and November 2009. The methodology of work has mainly consisted on the parsing of the requests sent by the users and on their subsequent filtering according to the study directives. Relevant information fields has been finally stored in a database for persistence and further characterization. The main results of this paper are twofold: it shows how to use the the traffic log to extract information about the use of Wikipedia, which is a novel research approach without precedences in the research community, and it analyzes whether the featured articles mechanism achieve to attract more attention or not. 0 0
A quantitative approach to the use of the Wikipedia English 2009 This paper presents a quantitative study of the use of the Wikipedia system by its users (both readers and editors), with special focus on the identification of time and kind-of-use patterns, characterization of traffic and workload, and comparative analysis of different language editions. The basis of the study is the filtering and analysis of a large sample of the requests directed to the Wikimedia systems for six weeks, each in a month from November 2007 to April 2008. In particular, we have considered the twenty most frequently visited language editions of the Wikipedia, identifying for each access to any of them the corresponding namespace (sets of resources with uniform semantics), resource name (article names, for example) and action (editions, submissions, history reviews, save operations, etc.). The results found include the identification of weekly and daily patterns, and several correlations between several actions on the articles. In summary, the study shows an overall picture of how the most visited language editions of the Wikipedia are being accessed by their users. 0 0
Measuring Wikipedia: A hands-on tutorial Data mining
Empirical research
Measurements
Wikipedia
WikiTrust
WikiXRay
WikiSym English 2009 This tutorial is an introduction to the best methodologies, tools and practices for Wikipedia research. The tutorial will be led by Luca de Alfaro (Wiki Lab at UCSC, California, USA) and Felipe Ortega (Libresoft, URJC, Madrid, Spain). Both cumulate several years of practical experience exploring and processing Wikipedia data [1], [2], [3]. As well, their respective research groups have led the development of two cutting-edge software tools (WikiTrust and WikiXRay), for analyzing Wikipedia. WikiTrust implements an author reputation system, and a text trust system, for wikis. WikiXRay is a tool automating the quantitative analysis of any language version of Wikipedia (in general, any wiki based on MediaWiki). Copyright 0 0
Measuring Wikipedia: a hands-on tutorial WikiTrust
WikiXRay
Wikipedia
Data mining
Empirical research
Measurements
WikiSym English 2009 0 0
On the Analysis of Contributions from Privileged Users in Virtual Open Communities Libre software
Wikipedia
HICSS English 2009 Collaborative projects built around virtual communities on the Internet have gained momentum over the last decade. Nevertheless, their rapid growth rate rises some questions: which is the most effective approach to manage and organize their content creation process? Can these communities scale, controlling their projects as their size continues to grow over time? To answer these questions, we undertake a quantitative analysis of privileged users in FLOSS development projects and in Wikipedia. From our results, we conclude that the inequality level of user contributions in both types of initiatives is remarkably distinct, even though both communities present almost identical patterns regarding the number of distinct contributors per file (in FLOSS projects) or per article (in Wikipedia). As a result, totally open projects like Wikipedia can effectively deal with faster growing rates, while FLOSS projects may be affected by bottlenecks on committers who play critical roles. 0 1
Survival analysis in open development projects Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development, FLOSS 2009 English 2009 Open collaborative projects, like FLOSS development projects and open content creation projects (e.g. Wikipedia), heavily depend on contributions from their respective communities to improve. In this context, an important question for both researchers and practitioners is: what is the expected lifetime of contributors in a community? Answering this question, we will be able to characterize these communities as an appropriate model can show whether or not users maintain their interest to contribute, for how long we could expect them to collaborate and, as a result, improve the organization and management of the project. In this paper, we demonstrate that survival analysis, a wellknown statistical methodology in other research areas such as epidemiology, biology or demographic studies, is a useful methodology to undertake a quantitative comparison of the lifetime of contributors in open collaborative initiatives, like the development of FLOSS projects and the Wikipedia, providing insightful answers to this challenging question. 0 1
Wikipedia: A Quantitative Analysis Universidad Rey Juan Carlos, Spain English 2009 In this doctoral thesis, we undertake a quantitative analysis of the top-ten language editions of Wikipedia, from different perspectives. Our main goal has been to trace the evolution in time of key descriptive and organizational parameters of Wikipedia and its community of authors. The analysis has focused on logged authors (those editors who created a personal account to participate in the project). Among the distinct metrics included, we can ?nd the monthly evolution of general metrics (number of revisions, active editors, active pages); the distribution of pages and its length, the evolution of participation in discussion pages. We also present a detailed analysis of the inner social structure and strati?cation of the Wikipedia community of logged authors, ?tting appropriate distributions to the most relevant metrics. We also examine the inequality level of contributions from logged authors, showing that there exists a core of very active authors who undertake most of the editorial work. Regarding articles, the inequality analysis also shows that there exists a reduced group of popular articles, though the distribution of revisions is not as skewed as in the previous case. The analysis continues with an in-depth demographic study of the community of authors, focusing on the evolution of the core of very active contributors (applying a statistical technique known as survival analysis). We also explore some basic metrics to analyze the quality of Wikipedia articles and the trustworthiness level of individual authors. This work concludes with an extended analysis of the evolution of the most in?uential parameters and metrics previously presented. Based on these metrics, we infer important conclusions about the future sustainability of Wikipedia. According to these results, the Wikipedia community of authors has ceased to grow, remaining stable since Summer 2006 until the end of 2007. As a result, the monthly number of revisions has remained stable over the same period, restricting the number of articles that can be reviewed by the community. On the other side, whilst the number of revisions in talk pages has stabilized over the same period, as well, the number of active talk pages follows a steady growing rate, for all versions. This suggests that the community of authors is shifting its focus to broaden the coverage of discussion pages, which has a direct impact in the ?nal quality of content, as previous research works has shown. Regarding the inner social structure of the Wikipedia community of logged authors, we ?nd Pareto-like distributions that ?t all relevant metrics pertaining authors (number of revisions per author, number of different articles edited per author), while measurements on articles (number of revisions per article, number of different authors per article) follow lognormal shapes. The analysis of the inequality level of revisions performed by authors, and revisions received by arti- cles shows highly unequal distributions. The results of our survival analysis on Wikipedia authors presents very high mortality percentages on young authors, revealing an endemic problem of Wikipedias to keep young editors on collaborating with the project for a long period of time. In the same way, from our survival analysis we obtain that the mean lifetime of Wikipedia authors in the core (until they abandon the group of top editors) is situated between 200 and 400 days, for all versions, while the median value is lower than 120 days in all cases. Moreover the analysis of the monthly number of births and deaths in the community of logged authors reveals that the cause of the shift in the monthly trend of active authors is produced by a higher number of deaths from Summer 2006 in all versions, surpassing the monthly number of births from then on. The analysis of the inequality level of contributions over time, and the evolution of additional key features identi?ed in this thesis, reveals a worrying trend towards progressive increase of the effort spent by core authors, as time elapses. This trend may eventually cause that these authors will reach their upper limit in the number of revisions they can perform each month, thus starting a decreasing trend in the number of monthly revisions, and an overall recession of the content creation and reviewing process in Wikipedia. To prevent this probable future scenario, the number of monthly new editors should be improved again, perhaps through the adoption of speci?c policies and campaigns for attracting new editors to Wikipedia, and recover older top- contributors again. Finally, another important contribution for the research community is {WikiXRay,} the soft- ware tool we have developed to perform the statistical analyses included in this thesis. This tool completely automates the process of retrieving the database dumps from the Wikimedia public repositories, process them to obtain key metrics and descriptive parameters, and load them in a local database, ready to be used in empirical analyses. As far as we know, this is the ?rst research work implementing a comparative analysis, from an quantitative point of view, of the top-ten language editions of Wikipedia, presenting results from many different scienti?c perspectives. Therefore, we expect that this contribution will help the scienti?c community to enhance their understanding of the rich, complex and fascinating work- ing mechanisms and behavioral patterns of the Wikipedia project and its community of authors. Likewise, we hope that {WikiXRay} will facilitate the hard task of developing empirical analyses on any language version of the encyclopedia, boosting in this way the number of comparative studies like this one in many other scienti?c disciplines. 0 8
On the Inequality of Contributions to Wikipedia English 2008 Wikipedia is one of the most successful examples of massive collaborative content development. However, many of the mechanisms and procedures that it uses are still unknown in detail. For instance, how equal (or unequal) are the contributions to it has been discussed in the last years, with no conclusive results. In this paper, we study exactly that aspect by using Lorenz curves and Gini coefficients, very well known instruments to economists. We analyze the trends in the inequality of distributions for the ten biggest language editions of Wikipedia, and their evolution over time. As a result, we have found large differences in the number of contributions by different authors (something also observed in free, open source software development), and a trend to stable patterns of inequality in the long run. 0 5
Quantitative analysis and characterization of Wikipedia requests WikiSym English 2008 Our poster describes the quantitative analysis carried out to study the use of the Wikipedia system by its users with special focus on the identification of time and kind-of-use patterns, characterization of traffic and workload, and comparative analysis of different language editions. By filtering and classifying a large sample of the requests directed to the Wikimedia systems over 7 days we have been able to identify important information such us the targeted namespaces, the visited resources or the requested actions. The results found include the identification of weekly and daily patterns, and several correlations between different actions on the articles. In summary, the study shows an overall picture of how the most visited language editions of the Wikipedia are being accessed by their users. 0 0
Quantitative analysis of the top ten wikipedias Collaborative development
Growth metrics
Quantitative analysis
Wikipedia
Communications in Computer and Information Science English 2008 In a few years, Wikipedia has become one of the information systems with more public of the Internet. Based on a relatively simple architecture it has proven to be capable of supporting the largest and more diverse community of collaborative authorship worldwide. Using a quantitative methodology, (analyzing public Wikipedia databases), we describe the main characteristics of the 10 largest language editions, and the authors that work in them. The methodology is generic enough to be used on the rest of the editions, providing a convenient framework to develop a complete quantitative analysis of the Wikipedia. Among other parameters, we study the evolution of the number of contributions and articles, their size, and the differences in contributions by different authors, inferring some relationships between contribution patterns and content. These relationships reflect (and in part, explain) the evolution of the different language editions so far, as well as their future trends. 0 0
Workshop on interdisciplinary research on Wikipedia and Wiki communities Collaboration
Interdisciplinary
Methodologies
Wiki communities
Wikipedia
WikiSym 2008 - The 4th International Symposium on Wikis, Proceedings English 2008 A growing number of projects seek to build upon the collective intelligence of Internet users, looking for more dynamic, open and creative approaches to content creation and knowledge sharing. To this end, many projects have chosen the wiki, and it is therefore the subject of much research interest, particularly Wikipedia, from varied disciplines. The array of approaches to study wikis is a source of wealth, but also a possible source of confusion: What are appropriate methodologies for the analysis of wiki communities? Which are the most critical parameters (both quantitative and qualitative) for study in wiki evolution and outcomes? Is it possible to find effective interdisciplinary approaches to augment our overall understanding of these dynamic, creative environments? This workshop intends to provide an opportunity to explore these questions, by researchers and practitioners willing to participate in a "brainstorming research meeting". 0 0
Workshop on interdisciplinary research on Wikipedia and wiki communities Wikipedia
Collaboration
Interdisciplinary
Methodologies
Wiki communities
Workshop wiki communities
WikiSym English 2008 0 0
Quantitative Analysis of the Wikipedia Community of Users WikiSym English 2007 Many activities of editors in Wikipedia can be traced using its database dumps, which register detailed information about every single change to every article. Several researchers have used this information to gain knowledge about the production process of articles, and about activity patterns of authors. In this analysis, we have focused on one of those previous works, by Kittur et al. First, we have followed the same methodology with more recent and comprehensive data. Then, we have extended this methodology to precisely identify which fraction of authors are producing most of the changes in Wikipedia's articles, and how the behaviour of these authors evolves over time. This enabled us not only to validate some of the previous results, but also to find new interesting evidences. We have found that the analysis of sysops is not a good method for estimating different levels of contributions, since it is dependent on the policy for electing them (which changes over time and for each language). Moreover, we have found new activity patterns classifying authors by their contributions during specific periods of time, instead of using their total number of contributions over the whole life of Wikipedia. Finally, we present a tool that automates this extended methodology, implementing a quick and complete quantitative analysis ofevery language edition in Wikipedia. 0 6
The Top Ten Wikipedias: A quantitative analysis using WikiXRay Wikipedia ICSOFT 2007, July 2007. Barcelona, Spain 2007 In a few years, Wikipedia has become one of the information systems with more public (both producers and consumers) of the Internet. Its system and information architecture is relatively simple, but has proven to be capable of supporting the largest and more diverse community of collaborative authorship worldwide. In this paper, we analyze in detail this community, and the contents it is producing. Using a quantitative methodology based on the analysis of the public Wikipedia databases, we describe the main characteristics of the 10 largest language editions, and the authors that work in them. The methodology (which is almost completely automated) is generic enough to be used on the rest of the editions, providing a convenient framework to develop a complete quantitative analysis of the Wikipedia. Among other parameters, we study the evolution of the number of contributions and articles, their size, and the differences in contributions by different authors, inferring some relationships between contribution patterns and content. These relationships reflect (and in part, explain) the evolution of the different language editions so far, as well as their future trends. 0 0
The top-ten wikipedias : A quantitative analysis using wikixray Collaborative development
Growth metrics
Quantitative analysis
Wikipedia
ICSOFT 2007 - 2nd International Conference on Software and Data Technologies, Proceedings English 2007 In a few years, Wilcipedia has become one of the information systems with more public (both producers and consumers) of the Internet. Its system and information architecture is relatively simple, but has proven to be capable of supporting the largest and more diverse community of collaborative authorship worldwide. In this paper, we analyze in detail this community, and the contents it is producing. Using a quantitative methodology based on the analysis of the public Wikipedia databases, we describe the main characteristics of the 10 largest language editions, and the authors that work in them. The methodology (which is almost completely automated) is generic enough to be used on the rest of the editions, providing a convenient framework to develop a complete quantitative analysis of the Wikipedia. Among other parameters, we study the evolution of the number of contributions and articles, their size, and the differences in contributions by different authors, inferring some relationships between contribution patterns and content. These relationships reflect (and in part, explain) the evolution of the different language editions so far, as well as their future trends. 0 0