Information quality

From WikiPapers
Jump to: navigation, search

information quality is included as keyword or extra keyword in 0 datasets, 0 tools and 43 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
"The sum of all human knowledge": A systematic review of scholarly research on the content of Wikipedia Mostafa Mesgari
Chitu Okoli
Mohamad Mehdi
Finn Årup Nielsen
Arto Lanamäki
Journal of the Association for Information Science and Technology English February 2015 Wikipedia may be the best-developed attempt thus far to gather all human knowledge in one place. Its accomplishments in this regard have made it a point of inquiry for researchers from different fields of knowledge. A decade of research has thrown light on many aspects of the Wikipedia community, its processes, and its content. However, due to the variety of fields inquiring about Wikipedia and the limited synthesis of the extensive research, there is little consensus on many aspects of Wikipedia's content as an encyclopedic collection of human knowledge. This study addresses the issue by systematically reviewing 110 peer-reviewed publications on Wikipedia content, summarizing the current findings, and highlighting the major research trends. Two major streams of research are identified: the quality of Wikipedia content (including comprehensiveness, currency, readability, and reliability) and the size of Wikipedia. Moreover, we present the key research trends in terms of the domains of inquiry, research design, data source, and data gathering methods. This review synthesizes scholarly understanding of Wikipedia content and paves the way for future studies. 0 0
Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia Maik Anderka Bauhaus-Universität Weimar, Germany English 2013 Web applications that are based on user-generated content are often criticized for containing low-quality information; a popular example is the online encyclopedia Wikipedia. The major points of criticism pertain to the accuracy, neutrality, and reliability of information. The identification of low-quality information is an important task since for a huge number of people around the world it has become a habit to first visit Wikipedia in case of an information need. Existing research on quality assessment in Wikipedia either investigates only small samples of articles, or else deals with the classification of content into high-quality or low-quality. This thesis goes further, it targets the investigation of quality flaws, thus providing specific indications of the respects in which low-quality content needs improvement. The original contributions of this thesis, which relate to the fields of user-generated content analysis, data mining, and machine learning, can be summarized as follows:

(1) We propose the investigation of quality flaws in Wikipedia based on user-defined cleanup tags. Cleanup tags are commonly used in the Wikipedia community to tag content that has some shortcomings. Our approach is based on the hypothesis that each cleanup tag defines a particular quality flaw.

(2) We provide the first comprehensive breakdown of Wikipedia's quality flaw structure. We present a flaw organization schema, and we conduct an extensive exploratory data analysis which reveals (a) the flaws that actually exist, (b) the distribution of flaws in Wikipedia, and, (c) the extent of flawed content.

(3) We present the first breakdown of Wikipedia's quality flaw evolution. We consider the entire history of the English Wikipedia from 2001 to 2012, which comprises more than 508 million page revisions, summing up to 7.9 TB. Our analysis reveals (a) how the incidence and the extent of flaws have evolved, and, (b) how the handling and the perception of flaws have changed over time.

(4) We are the first who operationalize an algorithmic prediction of quality flaws in Wikipedia. We cast quality flaw prediction as a one-class classification problem, develop a tailored quality flaw model, and employ a dedicated one-class machine learning approach. A comprehensive evaluation based on human-labeled Wikipedia articles underlines the practical applicability of our approach.
0 0
Monitoring network structure and content quality of signal processing articles on wikipedia Lee T.C.
Unnikrishnan J.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings English 2013 Wikipedia has become a widely-used resource on signal processing. However, the freelance-editing model of Wikipedia makes it challenging to maintain a high content quality. We develop techniques to monitor the network structure and content quality of Signal Processing (SP) articles on Wikipedia. Using metrics to quantify the importance and quality of articles, we generate a list of SP articles on Wikipedia arranged in the order of their need for improvement. The tools we use include the HITS and PageRank algorithms for network structure, crowdsourcing for quantifying article importance and known heuristics for article quality. 0 0
Tell me more: An actionable quality model for wikipedia Morten Warncke-Wang
Dan Cosley
John Riedl
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 In this paper we address the problem of developing actionable quality models for Wikipedia, models whose features directly suggest strategies for improving the quality of a given article. We rst survey the literature in order to understand the notion of article quality in the context of Wikipedia and existing approaches to automatically assess article quality. We then develop classication models with varying combinations of more or less actionable features, and nd that a model that only contains clearly actionable features delivers solid performance. Lastly we discuss the implications of these results in terms of how they can help improve the quality of articles across Wikipedia. Categories and Subject Descriptors H.5 [Information Interfaces and Presentation]: Group and Organization InterfacesCollaborative computing, Computer-supported cooperative work, Web-based interac- Tion. Copyright 2010 ACM. 0 0
When the levee breaks: Without bots, what happens to wikipedia's quality control processes? Geiger R.S.
Aaron Halfaker
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 In the first half of 2011, ClueBot NG - one of the most prolific counter-vandalism bots in the English-language Wikipedia - went down for four distinct periods, each period of downtime lasting from days to weeks. In this paper, we use these periods of breakdown as naturalistic experiments to study Wikipedia's heterogeneous quality control network, which we analyze as a multi-tiered system in which distinct classes of reviewers use various reviewing technologies to patrol for different kinds of damage at staggered time periods. Our analysis showed that the overall time-to-revert edits was almost doubled when this software agent was down. Yet while a significantly fewer proportion of edits made during the bot's downtime were reverted, we found that those edits were later eventually reverted. This suggests that other agents in Wikipedia took over this quality control work, but performed it at a far slower rate. Categories and Subject Descriptors H.5.3 [Information Systems]: Group and Organization Interfaces-computer-supported collaborative work. Copyright 2010 ACM. 0 0
A Breakdown of Quality Flaws in Wikipedia Maik Anderka
Benno Stein
2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 12) English 2012 The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing low-quality information, which is mainly attributed to its core policy of being open for editing by everyone. The identification of low-quality information is an important task since Wikipedia has become the primary source of knowledge for a huge number of people around the world. Previous research on quality assessment in Wikipedia either investigates only small samples of articles, or else focuses on single quality aspects, like accuracy or formality. This paper targets the investigation of quality flaws, and presents the first complete breakdown of Wikipedia's quality flaw structure. We conduct an extensive exploratory analysis, which reveals (1) the quality flaws that actually exist, (2) the distribution of flaws in Wikipedia, and (3) the extent of flawed content. An important finding is that more than one in four English Wikipedia articles contains at least one quality flaw, 70% of which concern article verifiability. 0 0
Behind the Article: Recognizing Dialog Acts in Wikipedia Talk Pages Oliver Ferschke
Iryna Gurevych
Yevgen Chebotar
Proceedings of the 13th Conference of the European Chapter of the ACL (EACL 2012) 2012 In this paper, we propose an annotation schema for the discourse analysis of Wikipedia Talk pages aimed at the coordination efforts for article improvement. We apply the annotation schema to a corpus of 100 Talk pages from the Simple English Wikipedia and make the resulting dataset freely available for download1 . Furthermore, we perform automatic dialog act classification on Wikipedia discussions and achieve an average F1 -score of 0.82 with our classification pipeline. 0 0
Codification and collaboration: Information quality in social media Kane G.C.
Ransbotham S.
International Conference on Information Systems, ICIS 2012 English 2012 This paper argues that social media combines the codification and collaboration features of earlier generations of knowledge management systems. This combination potentially changes the way knowledge is created, potentially requiring new theories and methods for understanding these processes. We forward the specialized social network method of two-mode networks as one such approach. We examine the information quality of 16,244 articles built through 2,677,397 revisions by 147,362 distinct contributors to Wikipedia's Medicine Wikiproject. We find that the structure of the contributor-artifact network is associated with information quality in these networks. Our findings have implications for managers seeking to cultivate effective knowledge creation environments using social media and to identify valuable knowledge created external to the firm. 0 0
Feature transformation method enhanced vandalism detection in wikipedia Chang T.
Hong Lin
Yi-Sheng Lin
Lecture Notes in Computer Science English 2012 A very example of web 2.0 application is Wikipedia, an online encyclopedia where anyone can edit and share information. However, blatantly unproductive edits greatly undermine the quality of Wikipedia. Their irresponsible acts force editors to waste time undoing vandalisms. For the purpose of improving information quality on Wikipedia and freeing the maintainer from such repetitive tasks, machine learning methods have been proposed to detect vandalism automatically. However, most of them focused on mining new features which seem to be inexhaustible to be discovered. Therefore, the question of how to make the best use of these features needs to be tackled. In this paper, we leverage feature transformation techniques to analyze the features and propose a framework using these methods to enhance detection. Experiment results on the public dataset PAN-WVC-10 show that our method is effective and it provides another useful method to help detect vandalism in Wikipedia. 0 0
Filling an Information Void: Using Wikipedia to Document the State of and Promote Women’s Soccer in Africa Laura Hale World of Football II, Melbourne 2012 In many parts of Africa, the development of women’s national soccer teams has been limited by a variety of local conditions including political instability, historic societal discrimination against women, sport institutions focused on the men’s game, and economic issues. For English language researchers based outside of Africa, researching African women’s soccer is difficult because of language issues, the lack of Internet sport infrastructure inside Africa, and other factors that create source limitations. Drawing attention to women’s sport, and women’s soccer in particular, is important because sport reflects on broader women’s health, human rights and education issues. One of the most visible public spaces to address this topic with sustained attention and longevity is Wikipedia. Efforts were made to systematically improve the quality of African national team articles on Wikipedia. This paper will explore the issues of using Wikipedia to promote women’s soccer in Africa by working to document the above issues in articles about African women’s national teams. 0 0
On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia Maik Anderka
Benno Stein
Matthias Busse
Wikipedia Academy English 2012 The improvement of information quality is a major task for the free online encyclopedia Wikipedia. Recent studies targeted the analysis and detection of specific quality flaws in Wikipedia articles. To date, quality flaws have been exclusively investigated in current Wikipedia articles, based on a snapshot representing the state of Wikipedia at a certain time. This paper goes further, and provides the first comprehensive breakdown of the evolution of quality flaws in Wikipedia. We utilize cleanup tags to analyze the quality flaws that have been tagged by the Wikipedia community in the English Wikipedia, from its launch in 2001 until 2011. This leads to interesting findings regarding (1) the development of Wikipedia's quality flaw structure and (1) the usage and the effectiveness of cleanup tags. Specifically, we show that inline tags are more effective than tag boxes, and provide statistics about the considerable volume of rare and non-specific cleanup tags. We expect that this work will support the Wikipedia community in making quality assurance activities more efficient. 0 0
Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia Maik Anderka
Benno Stein
CLEF English 2012 The paper overviews the task "Quality Flaw Prediction in Wikipedia" of the PAN'12 competition. An evaluation corpus is introduced which comprises 1,592,226 English Wikipedia articles, of which 208,228 have been tagged to contain one of ten important quality flaws. Moreover, the performance of three quality flaw classifiers is evaluated. 0 0
Predicting Quality Flaws in User-generated Content: The Case of Wikipedia Maik Anderka
Benno Stein
Nedim Lipka
35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) English 2012 The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. 0 0
Predicting quality flaws in user-generated content: The case of wikipedia Maik Anderka
Benno Stein
Nedim Lipka
SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval English 2012 The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. 0 0
The order measure model of collaborative knowledge structure Jiangnan Q.
Xuan Q.
Chunling W.
Advances in Information Sciences and Service Sciences English 2012 Collaborative knowledge structure is a kind of group knowledge structure formed through independent work, discussions between the individuals and collaborative cognition based on common understanding of cognitive object. While much research focused on measure method of knowledge structure from structure, little is considered the influence of information quality in the method. This paper regards collaborative knowledge structure in Wikipedia as an example. Based on the structure entropy model, information quality has been introduced to build the order measure model of collaborative knowledge structure in order to disclose evolution laws from structural and quality prospect in this paper. 0 0
A comparative assessment of answer quality on four question answering sites Fichman P. Journal of Information Science English 2011 Question answering (Q&A) sites, where communities of volunteers answer questions, may provide faster, cheaper, and better services than traditional institutions. However, like other Web 2.0 platforms, user-created content raises concerns about information quality. At the same time, Q&A sites may provide answers of different quality because they have differen communities and technological platforms. This paper compares answer quality on four Q&A sites: Askville, WikiAnswers, Wikipedia Reference Desk, and Yahoo! Answers. Findings indicate that: (1) similar collaborative processes on these sites result in a wide range of outcomes, and significant differences in answer accuracy, completeness, and verifiability were evident; (2) answer multiplication does not always result in better information; it yields more complete and verifiable answers but does not result in higher accuracy levels; and (3) a Q&A site's popularity does not correlate with its answer quality, on all three measures. 0 0
A multimethod study of information quality in wiki collaboration Gerald C. Kane ACM Trans. Manage. Inf. Syst. English 2011 0 0
Detection of Text Quality Flaws as a One-class Classification Problem Maik Anderka
Benno Stein
Nedim Lipka
20th ACM Conference on Information and Knowledge Management (CIKM 11) English 2011 For Web applications that are based on user generated content the detection of text quality flaws is a key concern. Our research contributes to automatic quality flaw detection. In particular, we propose to cast the detection of text quality flaws as a one-class classification problem: we are given only positive examples (= texts containing a particular quality flaw) and decide whether or not an unseen text suffers from this flaw. We argue that common binary or multiclass classification approaches are ineffective in here, and we underpin our approach by a real-world application: we employ a dedicated one-class learning approach to determine whether a given Wikipedia article suffers from certain quality flaws. Since in the Wikipedia setting the acquisition of sensible test data is quite intricate, we analyze the effects of a biased sample selection. In addition, we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. Altogether, provided test data with little noise, four from ten important quality flaws in Wikipedia can be detected with a precision close to 1. 0 0
Information Quality in Wikipedia: The Effects of Group Composition and Task Conflict Ofer Arazy
Oded Nov
Raymond Patterson
Lisa Yeo
J. Manage. Inf. Syst. English 2011 0 2
Information quality in wikipedia: The effects of group composition and task conflict Ofer Arazy
Oded Nov
Raymond Patterson
Lisa Yeo
Journal of Management Information Systems English 2011 The success of Wikipedia demonstrates that self-organizing production communities can produce high-quality information-based products. Research on Wikipedia has proceeded largely atheoretically, focusing on (1) the diversity in members' knowledge bases as a determinant of Wikipedia's content quality, (2) the task-related conflicts that occur during the collaborative authoring process, and (3) the different roles members play in Wikipedia. We develop a theoretical model that explains how these three factors interact to determine the quality of Wikipedia articles. The results from the empirical study of 96 Wikipedia articles suggest that (1) diversity should be encouraged, as the creative abrasion that is generated when cognitively diverse members engage in task-related conflict leads to higher-quality articles, (2) task conflict should be managed, as conflict-notwithstanding its contribution to creative abrasion-can negatively affect group output, and (3) groups should maintain a balance of both administrative- and content-oriented members, as both contribute to the collaborative process. © 2011 M.E. Sharpe, Inc. 0 2
Predicting the perceived quality of online mathematics contributions from users' reputations Tausczik Y.R.
Pennebaker J.W.
Conference on Human Factors in Computing Systems - Proceedings English 2011 There are two perspectives on the role of reputation in collaborative online projects such as Wikipedia or Yahoo! Answers. One, user reputation should be minimized in order to increase the number of contributions from a wide user base. Two, user reputation should be used as a heuristic to identify and promote high quality contributions. The current study examined how offline and online reputations of contributors affect perceived quality in MathOverflow, an online community with 3470 active users. On MathOverflow, users post high-level mathematics questions and answers. Community members also rate the quality of the questions and answers. This study is unique in being able to measure offline reputation of users. Both offline and online reputations were consistently and independently related to the perceived quality of authors submissions, and there was only a moderate correlation between established offline and newly developed online reputation. Copyright 2011 ACM. 0 0
Towards Automatic Quality Assurance in Wikipedia Maik erka
Benno Stein
Nedim Lipka
Proceedings of the 20th International Conference on World Wide Web 2011 Featured articles in Wikipedia stand for high information quality, and it has been found interesting to researchers to analyze whether and how they can be distinguished from "ordinary" articles. Here we point out that article discrimination falls far short of writer support or automatic quality assurance: Featured articles are not identified, but are made. Following this motto we compile a comprehensive list of information quality flaws in Wikipedia, model them according to the latest state of the art, and devise one-class classification technology for their identification. 0 0
Towards automatic quality assurance in Wikipedia Maik Anderka
Benno Stein
Nedim Lipka
20th International Conference on World Wide Web (WWW 11) English 2011 Featured articles in Wikipedia stand for high information quality, and it has been found interesting to researchers to analyze whether and how they can be distinguished from "ordinary" articles. Here we point out that article discrimination falls far short of writer support or automatic quality assurance: Featured articles are not identified, but are made. Following this motto we compile a comprehensive list of information quality flaws in Wikipedia, model them according to the latest state of the art, and devise one-class classification technology for their identification. 0 0
Determinants of Wikipedia quality: the roles of global and local contribution inequality Ofer Arazy
Oded Nov
English 2010 The success of Wikipedia and the relative high quality of its articles seem to contradict conventional wisdom. Recent studies have begun shedding light on the processes contributing to Wikipedia's success, highlighting the role of coordination and contribution inequality. In this study, we expand on these works in two ways. First, we make a distinction between global (Wikipedia-wide) and local (article-specific) inequality and investigate both constructs. Second, we explore both direct and indirect effects of these inequalities, exposing the intricate relationships between global inequality, local inequality, coordination, and article quality. We tested our hypotheses on a sample of a Wikipedia articles using structural equation modeling and found that global inequality exerts significant positive impact on article quality, while the effect of local inequality is indirect and is mediated by coordination 0 1
Identifying featured articles in Wikipedia: Writing style matters Nedim Lipka
Benno Stein
Proceedings of the 19th International Conference on World Wide Web, WWW '10 English 2010 Wikipedia provides an information quality assessment model with criteria for human peer reviewers to identify featured articles. For this classification task "Is an article featured or not?" we present a machine learning approach that exploits an article's character trigram distribution. Our approach differs from existing research in that it aims to writing style rather than evaluating meta features like the edit history. The approach is robust, straightforward to implement, and outperforms existing solutions. We underpin these claims by an experiment design where, among others, the domain transferability is analyzed. The achieved performances in terms of the F-measure for featured articles are 0.964 within a single Wikipedia domain and 0.880 in a domain transfer situation. 0 1
Identifying featured articles in wikipedia: writing style matters Nedim Lipka
Benno Stein
World Wide Web English 2010 0 1
Mining the Factors Affecting the Quality of Wikipedia Articles Kewen Wu
Qinghua Zhu
Yuxiang Zhao
Hua Zheng
ISME English 2010 0 0
Mining the factors affecting the quality of Wikipedia articles Wu K.
Qinghua Zhu
Yang Zhao
Hua Zheng
Proceedings - 2010 International Conference of Information Science and Management Engineering, ISME 2010 English 2010 In order to observe the variation of factors affecting the quality of Wikipedia articles during the information quality improvement process, we proposed 28 metrics from four aspects, including lingual, structural, historical and reputational features, and then weighted each metrics in different stages by using neural network. We found lingual features weighted more in the lower quality stages, and structural features, along with historical features, became more important while article quality improved. However, reputational features did not act as important as expected. The findings indicate that the information quality is mainly affected by completeness, and well-written is a basic requirement in the initial stage. Reputation of authors or editors is not so important in Wikipedia because of its horizontal structure. 0 0
Trust in wikipedia: How users trust information from an unknown source Teun Lucassen
Schraagen J.M.
Proceedings of the 4th Workshop on Information Credibility, WICOW '10 English 2010 The use of Wikipedia as an information source is becoming increasingly popular. Several studies have shown that its information quality is high. Normally, when considering information trust, the source of information is an important factor. However, because of the open-source nature of Wikipedia articles, their sources remain mostly unknown. This means that other features need to be used to assess the trustworthiness of the articles. We describe article features - such as images and references - which lay Wikipedia readers use to estimate trustworthiness. The quality and the topics of the articles are manipulated in an experiment to reproduce the varying quality on Wikipedia and the familiarity of the readers with the topics. We show that the three most important features are textual features, references and images. 0 2
An empirical study on criteria for assessing information quality in corporate wikis Friberg T.
Reinhardt W.
Proceedings of the 2009 International Conference on Information Quality, ICIQ 2009 English 2009 Wikis gain more and more attention as tool for corporate knowledge management. The usage of corporate wikis differs from public wikis like the Wikipedia as there are hardly any wiki wars or copyright issues. Nevertheless the quality of the available articles is of high importance in corporate wikis as well as in public ones. This paper presents the results from an empirical study on criteria for assessing information quality of articles in corporate wikis. Therefore existing approaches for assessing information quality are evaluated and a specific wikiset of criteria is defined. This wiki-set was examined in a study with participants from 21 different German companies using wikis as essential part of their knowledge management toolbox. Furthermore this paper discusses various ways for the automatic and manual rating of information quality and the technical implementation of such an IQ-profile for wikis. 0 0
Experiments with wikipedia cross-language data fusion Tacchini E.
Schultz A.
Christian Bizer
CEUR Workshop Proceedings English 2009 There are currently Wikipedia editions in 264 different languages. Each of these editions contains infoboxes that provide structured data about the topic of the article in which an infobox is contained. The content of infoboxes about the same topic in different Wikipedia editions varies in completeness, coverage and quality. This paper examines the hypothesis that by extracting infobox data from multiple Wikipedia editions and by fusing the extracted data among editions it should be possible to complement data from one edition with previously missing values from other editions and to increase the overall quality of the extracted dataset by choosing property values that are most likely correct in case of inconsistencies among editions. We will present a software framework for fusing RDF datasets based on different conflict resolution strategies. We will apply the framework to fuse infobox data that has been extracted from the English, German, Italian and French editions of Wikipedia and will discuss the accuracy of the conflict resolution strategies that were used in this experiment. 0 0
Towards assessing information quality in knowledge management in the enterprise 2.0 Ahlheid S.
Friberg T.
Graefe G.
Krebs A.
Muller J.-P.
Schuster D.
Proceedings of the 2009 International Conference on Information Quality, ICIQ 2009 English 2009 With regard to the success stories of Web 2.0 based knowledge centers such as the online encyclopedia Wikipedia [54] companies have begun to enrich their corporate knowledge management with Web 2.0 technologies, hoping to benefit from increasing flows of information. Besides information quantity, the quality of information is a key factor determining the return on investment of such Enterprise 2.0 platforms. In this context we will discuss requirements for the concept of information quality, identify important differences to the Web 2.0 environment and also elaborate on the basic design of a system assessing information quality in an Enterprise 2.0 context. We will thereby integrate implicit user feedback and explain the key benefits of this novel approach. 0 0
An activity theoretic model for information quality change Stvilia
B.
Gasser
L.
First Monday, 13(4) 2008 To manage information quality (IQ) effectively, one needs to know how IQ changes over time, what causes it to change, and whether the changes can be predicted. In this paper we analyze the structure of IQ change in Wikipedia, an open, collaborative general encyclopedia. We found several patterns in Wikipedia’s IQ process trajectories and linked them to article types. Drawing on the results of our analysis, we develop a general model of IQ change that can be used for reasoning about IQ dynamics in many different settings, including traditional databases and information repositories. 0 1
Size matters: Word count as a measure of quality on Wikipedia Blumenstock J.E. Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08 English 2008 Wikipedia, "the free encyclopedia", now contains over two million English articles, and is widely regarded as a high-quality, authoritative encyclopedia. Some Wikipedia articles, however, are of questionable quality, and it is not always apparent to the visitor which articles are good and which are bad. We propose a simple metric - word count - for measuring article quality. In spite of its striking simplicity, we show that this metric significantly outperforms the more complex methods described in related work. 0 3
Size matters: word count as a measure of quality on wikipedia Joshua E. Blumenstock World Wide Web English 2008 0 3
The adoption of Wikipedia: A community- and information quality-based view Kai Wang
Lin C.-L.
Chen C.-D.
Yang S.-C.
PACIS 2008 - 12th Pacific Asia Conference on Information Systems: Leveraging ICT for Resilient Organizations and Sustainable Growth in the Asia Pacific Region English 2008 The Web 2.0 model has aroused vast attention as it alters the traditional role of Internet users as pure information receivers. Wikipedia, as one of the most successful case of the Web 2.0 model, creates an online encyclopedia through the collective efforts of volunteers. Shared freely by all Internet users, it forms an online community platform on which users can seek and share knowledge. This study investigates the factors that affect the adoption of Wikipedia. Based on the TAM of Davis (1989), perceived critical mass, community identification, and perceived information quality were incorporated into the research model to explain the intention and usage of Wikipedia. This research is a work-in-progress and a questionnaire survey will be executed, targeting at Internet users who had prior experiences with knowledge seeking on Wikipedia. 0 0
The adoption of Wikipedia: a community- and information quality-based view Kai Wang
Chien-Liang Lin
Chun-Der Chen
Shu-Chen Yang
12th Pacific Asia Conference on Information Systems (PACIS) 2008 0 0
Scientific citations in Wikipedia Finn Årup Nielsen First Monday English 6 August 2007 The Internet–based encyclopædia Wikipedia has grown to become one of the most visited Web sites on the Internet, but critics have questioned the quality of entries. An empirical study of Wikipedia found errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the “Wikipedia risks.” This paper describes a simple assessment of these aspects by examining the outbound links from Wikipedia articles to articles in scientific journals with a comparison against journal statistics from Journal Citation Reports such as impact factors. The results show an increasing use of structured citation markup and good agreement with citation patterns seen in the scientific literature though with a slight tendency to cite articles in high–impact journals such as Nature and Science. These results increase confidence in Wikipedia as a reliable information resource for science in general. 7 7
Amodel for information quality change: Completed paper Besiki Stvilia Proceedings of the 2007 International Conference on Information Quality, ICIQ 2007 English 2007 To manage information quality (IQ) effectively, one needs to know how IQ changes over time, what causes it to change, and whether the changes can be predicted. In this paper we analyze the structure of IQ change in Wikipedia, an open, collaborative general encyclopedia. We found several patterns in Wikipedia's IQ process trajectories and linked them to article types. Drawing on the results of our analysis, we develop a general model of IQ change that can be used for reasoning about IQ dynamics in many different settings, including traditional databases. 0 0
Interlingual aspects of wikipedia's quality Hammwohner R. Proceedings of the 2007 International Conference on Information Quality, ICIQ 2007 English 2007 This paper presents interim results of an ongoing project on quality issues concerning Wikipedia. One focus of research is the relation of language and quality measurement. The other one is the use of interlingual relations for quality assessment and improvement. The study is based on mono- and multilingual samples of featured and non-featured Wikipedia articles in English, French, German, and Italian that are evaluated automatically. 0 1
Wisdom of the Crowds: Decentralized Knowledge Construction in Wikipedia Ofer Arazy
Wayne Morgan
Raymond Patterson
16th Annual Workshop on Information Technologies & Systems (WITS) 2006 Recently, Nature published an article comparing the quality of Wikipedia articles to those of Encyclopedia Britannica (Giles 2005). The article, which gained much public attention, provides evidence for Wikipedia quality, but does not provide an explanation of the underlying source of that quality. Wikipedia, and wikis in general, aggregate information from a large and diverse author-base, where authors are free to modify any article. Building upon Surowiecki's (2005) Wisdom of Crowds, we develop a model of the factors that determine wiki content quality. In an empirical study of Wikipedia, we find strong support for our model. Our results indicate that increasing size and diversity of the author-base improves content quality. We conclude by highlighting implications for system design and suggesting avenues for future research. 0 0
Assessing information quality of a community-based encyclopedia Besiki Stvilia
Michael B. Twidale
Linda C. Smith
Les Gasser
Proceedings of the International Conference on Information Quality English 2005 Effective information quality analysis needs powerful yet easy ways to obtain metrics. The English version of Wikipedia provides an extremely interesting yet challenging case for the study of Information Quality dynamics at both macro and micro levels. We propose seven IQ metrics which can be evaluated automatically and test the set on a representative sample of Wikipedia content. The methodology of the metrics construction and the results of tests, along with a number of statistical characterizations of Wikipedia articles, their content construction, process metadata and social context are reported. 5 4
Information quality in a community-based encyclopedia Stvilia
B.
Twidale
M. B.
Gasser
L.
Smith
L. C.
Knowledge Management: Nurturing Culture, Innovation, and Technology - Proceedings of the 2005 International Conference on Knowledge Management (pp. 101-113) 2005 We examine the Information Quality aspects of Wikipedia. By a study of the discussion pages and other process-oriented pages within the Wikipedia project, it is possible to determine the information quality dimensions that participants in the editing process care about, how they talk about them, what tradeoffs they make between these dimensions and how the quality assessment and improvement process operates. This analysis helps in understanding how high quality is maintained in a project where anyone may participate with no prior vetting. It also carries implications for improving the quality of more conventional datasets. 0 1