Information quality
From WikiPapers
| information quality (Alternative names for this keyword) | |
| Related keyword(s) | Unknown [+] |
| Export and share | |
| BibTeX, CSV, RDF, JSON | |
| | |
| Browse properties · List of keywords | |
information quality is included as keyword or extra keyword in 0 datasets, 0 tools and 12 publications.
Datasets
There is no datasets for this keyword.
Tools
There is no tools for this keyword.
Publications
| Title | Author(s) | Published in | Language | DateThis property is a special property in this wiki. | Abstract | R | C |
|---|---|---|---|---|---|---|---|
| A Breakdown of Quality Flaws in Wikipedia | Maik Anderka Benno Stein |
2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 12) | English | 2012 | The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing low-quality information, which is mainly attributed to its core policy of being open for editing by everyone. The identification of low-quality information is an important task since Wikipedia has become the primary source of knowledge for a huge number of people around the world. Previous research on quality assessment in Wikipedia either investigates only small samples of articles, or else focuses on single quality aspects, like accuracy or formality. This paper targets the investigation of quality flaws, and presents the first complete breakdown of Wikipedia's quality flaw structure. We conduct an extensive exploratory analysis, which reveals (1) the quality flaws that actually exist, (2) the distribution of flaws in Wikipedia, and (3) the extent of flawed content. An important finding is that more than one in four English Wikipedia articles contains at least one quality flaw, 70% of which concern article verifiability. | 0 | 0 |
| On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia | Maik Anderka Benno Stein Matthias Busse |
Wikipedia Academy | English | 2012 | The improvement of information quality is a major task for the free online encyclopedia Wikipedia. Recent studies targeted the analysis and detection of specific quality flaws in Wikipedia articles. To date, quality flaws have been exclusively investigated in current Wikipedia articles, based on a snapshot representing the state of Wikipedia at a certain time. This paper goes further, and provides the first comprehensive breakdown of the evolution of quality flaws in Wikipedia. We utilize cleanup tags to analyze the quality flaws that have been tagged by the Wikipedia community in the English Wikipedia, from its launch in 2001 until 2011. This leads to interesting findings regarding (1) the development of Wikipedia's quality flaw structure and (1) the usage and the effectiveness of cleanup tags. Specifically, we show that inline tags are more effective than tag boxes, and provide statistics about the considerable volume of rare and non-specific cleanup tags. We expect that this work will support the Wikipedia community in making quality assurance activities more efficient. | 0 | 0 |
| Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia | Maik Anderka Benno Stein |
CLEF | English | 2012 | The paper overviews the task "Quality Flaw Prediction in Wikipedia" of the PAN'12 competition. An evaluation corpus is introduced which comprises 1,592,226 English Wikipedia articles, of which 208,228 have been tagged to contain one of ten important quality flaws. Moreover, the performance of three quality flaw classifiers is evaluated. | 0 | 0 |
| Predicting Quality Flaws in User-generated Content: The Case of Wikipedia | Maik Anderka Benno Stein Nedim Lipka |
35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) | English | 2012 | The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. | 0 | 0 |
| A multimethod study of information quality in wiki collaboration | Gerald C. Kane | ACM Trans. Manage. Inf. Syst. | English | 2011 | 0 | 0 | |
| Detection of Text Quality Flaws as a One-class Classification Problem | Maik Anderka Benno Stein Nedim Lipka |
20th ACM Conference on Information and Knowledge Management (CIKM 11) | English | 2011 | For Web applications that are based on user generated content the detection of text quality flaws is a key concern. Our research contributes to automatic quality flaw detection. In particular, we propose to cast the detection of text quality flaws as a one-class classification problem: we are given only positive examples (= texts containing a particular quality flaw) and decide whether or not an unseen text suffers from this flaw. We argue that common binary or multiclass classification approaches are ineffective in here, and we underpin our approach by a real-world application: we employ a dedicated one-class learning approach to determine whether a given Wikipedia article suffers from certain quality flaws. Since in the Wikipedia setting the acquisition of sensible test data is quite intricate, we analyze the effects of a biased sample selection. In addition, we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. Altogether, provided test data with little noise, four from ten important quality flaws in Wikipedia can be detected with a precision close to 1. | 0 | 0 |
| Information Quality in Wikipedia: The Effects of Group Composition and Task Conflict | Ofer Arazy Oded Nov Raymond Patterson Lisa Yeo |
J. Manage. Inf. Syst. | English | 2011 | 0 | 1 | |
| Towards automatic quality assurance in Wikipedia | Maik Anderka Benno Stein Nedim Lipka |
20th International Conference on World Wide Web (WWW 11) | English | 2011 | Featured articles in Wikipedia stand for high information quality, and it has been found interesting to researchers to analyze whether and how they can be distinguished from "ordinary" articles. Here we point out that article discrimination falls far short of writer support or automatic quality assurance: Featured articles are not identified, but are made. Following this motto we compile a comprehensive list of information quality flaws in Wikipedia, model them according to the latest state of the art, and devise one-class classification technology for their identification. | 0 | 0 |
| Identifying featured articles in wikipedia: writing style matters | Nedim Lipka Benno Stein |
World Wide Web | English | 2010 | 0 | 1 | |
| Mining the Factors Affecting the Quality of Wikipedia Articles | Kewen Wu Qinghua Zhu Yuxiang Zhao Hua Zheng |
ISME | English | 2010 | 0 | 0 | |
| Size matters: word count as a measure of quality on wikipedia | Joshua E. Blumenstock | World Wide Web | English | 2008 | 0 | 0 | |
| Assessing information quality of a community-based encyclopedia | Besiki Stvilia Michael B. Twidale Linda C. Smith Les Gasser |
Proceedings of the International Conference on Information Quality | English | 2005 | Effective information quality analysis needs powerful yet easy ways to obtain metrics. The English version of Wikipedia provides an extremely interesting yet challenging case for the study of Information Quality dynamics at both macro and micro levels. We propose seven IQ metrics which can be evaluated automatically and test the set on a representative sample of Wikipedia content. The methodology of the metrics construction and the results of tests, along with a number of statistical characterizations of Wikipedia articles, their content construction, process metadata and social context are reported. | 5 | 4 |
