Browse wiki

Jump to: navigation, search
Ranking Wikipedia article's data quality by learning dimension distributions
Abstract As the largest free user-generated knowledAs the largest free user-generated knowledge repository, data quality of Wikipedia has attracted great attention these years. Automatic assessment of Wikipedia article's data quality is a pressing concern. We observe that every Wikipedia quality class exhibits its specific characteristic along different first-class quality dimensions including accuracy, completeness, consistency and minimality. We propose to extract quality dimension values from article's content and editing history using dynamic Bayesian network (DBN) and information extraction techniques. Next, we employ multivariate Gaussian distributions to model quality dimension distributions for each quality class, and combine multiple trained classifiers to predict an article's quality class, which can distinguish different quality classes effectively and robustly. Experiments demonstrate that our approach generates a good performance. Copyrightch generates a good performance. Copyright
Abstractsub As the largest free user-generated knowledAs the largest free user-generated knowledge repository, data quality of Wikipedia has attracted great attention these years. Automatic assessment of Wikipedia article's data quality is a pressing concern. We observe that every Wikipedia quality class exhibits its specific characteristic along different first-class quality dimensions including accuracy, completeness, consistency and minimality. We propose to extract quality dimension values from article's content and editing history using dynamic Bayesian network (DBN) and information extraction techniques. Next, we employ multivariate Gaussian distributions to model quality dimension distributions for each quality class, and combine multiple trained classifiers to predict an article's quality class, which can distinguish different quality classes effectively and robustly. Experiments demonstrate that our approach generates a good performance. Copyrightch generates a good performance. Copyright
Bibtextype article  +
Doi 10.1504/IJIQ.2014.064056  +
Has author Jangwhan Han + , Chen K. +
Has extra keyword Data quality + , Ensemble learning + , Multivariate Gaussian Distributions + , Quality dimension + , Wikipedia +
Has keyword Data quality + , Ensemble learning + , Multivariate Gaussian distribution + , Quality dimensions + , Wikipedia +
Issn 17510457  +
Issue 3  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 207–227  +
Published in International Journal of Information Quality +
Title Ranking Wikipedia article's data quality by learning dimension distributions +
Type journal article  +
Volume 3  +
Year 2014 +
Creation dateThis property is a special property in this wiki. 7 November 2014 14:12:00  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Journal articles  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 14:12:00  +
DateThis property is a special property in this wiki. 2014  +
hide properties that link here 
Ranking Wikipedia article's data quality by learning dimension distributions + Title
 

 

Enter the name of the page to start browsing from.