Browse wiki

Jump to: navigation, search
Analyzing and predicting quality flaws in user-generated content the case of wikipedia
Abstract Web applications that are based on user-geWeb applications that are based on user-generated content are often criticized for containing low-quality information; a popular example is the online encyclopedia Wikipedia. The major points of criticism pertain to the accuracy, neutrality, and reliability of information. The identification of low-quality information is an important task since for a huge number of people around the world it has become a habit to first visit Wikipedia in case of an information need. Existing research on quality assessment in Wikipedia either investigates only small samples of articles, or else deals with the classification of content into high-quality or low-quality. This thesis goes further, it targets the investigation of quality flaws, thus providing specific indications of the respects in which low-quality content needs improvement. The original contributions of this thesis, which relate to the fields of user-generated content analysis, data mining, and machine learning, can be summarized as follows: (1) We propose the investigation of quality flaws in Wikipedia based on user-defined cleanup tags. Cleanup tags are commonly used in the Wikipedia community to tag content that has some shortcomings. Our approach is based on the hypothesis that each cleanup tag defines a particular quality flaw. (2) We provide the first comprehensive breakdown of Wikipedia's quality flaw structure. We present a flaw organization schema, and we conduct an extensive exploratory data analysis which reveals (a) the flaws that actually exist, (b) the distribution of flaws in Wikipedia, and, (c) the extent of flawed content. (3) We present the first breakdown of Wikipedia's quality flaw evolution. We consider the entire history of the English Wikipedia from 2001 to 2012, which comprises more than 508 million page revisions, summing up to 7.9 TB. Our analysis reveals (a) how the incidence and the extent of flaws have evolved, and, (b) how the handling and the perception of flaws have changed over time. (4) We are the first who operationalize an algorithmic prediction of quality flaws in Wikipedia. We cast quality flaw prediction as a one-class classification problem, develop a tailored quality flaw model, and employ a dedicated one-class machine learning approach. A comprehensive evaluation based on human-labeled Wikipedia articles underlines the practical applicability of our approach.e practical applicability of our approach.
Abstractsub Web applications that are based on user-geWeb applications that are based on user-generated content are often criticized for containing low-quality information; a popular example is the online encyclopedia Wikipedia. The major points of criticism pertain to the accuracy, neutrality, and reliability of information. The identification of low-quality information is an important task since for a huge number of people around the world it has become a habit to first visit Wikipedia in case of an information need. Existing research on quality assessment in Wikipedia either investigates only small samples of articles, or else deals with the classification of content into high-quality or low-quality. This thesis goes further, it targets the investigation of quality flaws, thus providing specific indications of the respects in which low-quality content needs improvement. The original contributions of this thesis, which relate to the fields of user-generated content analysis, data mining, and machine learning, can be summarized as follows: (1) We propose the investigation of quality flaws in Wikipedia based on user-defined cleanup tags. Cleanup tags are commonly used in the Wikipedia community to tag content that has some shortcomings. Our approach is based on the hypothesis that each cleanup tag defines a particular quality flaw. (2) We provide the first comprehensive breakdown of Wikipedia's quality flaw structure. We present a flaw organization schema, and we conduct an extensive exploratory data analysis which reveals (a) the flaws that actually exist, (b) the distribution of flaws in Wikipedia, and, (c) the extent of flawed content. (3) We present the first breakdown of Wikipedia's quality flaw evolution. We consider the entire history of the English Wikipedia from 2001 to 2012, which comprises more than 508 million page revisions, summing up to 7.9 TB. Our analysis reveals (a) how the incidence and the extent of flaws have evolved, and, (b) how the handling and the perception of flaws have changed over time. (4) We are the first who operationalize an algorithmic prediction of quality flaws in Wikipedia. We cast quality flaw prediction as a one-class classification problem, develop a tailored quality flaw model, and employ a dedicated one-class machine learning approach. A comprehensive evaluation based on human-labeled Wikipedia articles underlines the practical applicability of our approach.e practical applicability of our approach.
Bibtextype phdthesis  +
Has author Maik Anderka +
Has keyword Information quality + , Wikipedia + , Quality Flaws + , Quality Flaw Prediction +
Has remote mirror http://www.uni-weimar.de/medien/webis/publications/papers/anderka_2013.pdf  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Peer-reviewed Yes  +
Published in Bauhaus-Universität Weimar, Germany +
Related dataset Wikimedia dumps + , PAN Wikipedia quality flaw corpus 2012 +
Title Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia +
Type doctoral thesis  +
Year 2013 +
Creation dateThis property is a special property in this wiki. 10 July 2013 09:06:05  +
Categories Publications without license parameter  + , Publications without DOI parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Doctoral theses  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 20 September 2014 17:43:41  +
redirect page Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia  +
DateThis property is a special property in this wiki. 2013  +
hide properties that link here 
Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia + Title
 

 

Enter the name of the page to start browsing from.