Ranking Wikipedia article's data quality by learning dimension distributions

From WikiPapers
Jump to: navigation, search

Ranking Wikipedia article's data quality by learning dimension distributions is a 2014 journal article written in English by Han J., Chen K. and published in International Journal of Information Quality.

[edit] Abstract

As the largest free user-generated knowledge repository, data quality of Wikipedia has attracted great attention these years. Automatic assessment of Wikipedia article's data quality is a pressing concern. We observe that every Wikipedia quality class exhibits its specific characteristic along different first-class quality dimensions including accuracy, completeness, consistency and minimality. We propose to extract quality dimension values from article's content and editing history using dynamic Bayesian network (DBN) and information extraction techniques. Next, we employ multivariate Gaussian distributions to model quality dimension distributions for each quality class, and combine multiple trained classifiers to predict an article's quality class, which can distinguish different quality classes effectively and robustly. Experiments demonstrate that our approach generates a good performance. Copyright

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.