Ranking Wikipedia article's data quality by learning dimension distributions
|Ranking Wikipedia article's data quality by learning dimension distributions|
|Author(s)||Han J., Chen K.|
|Published in||International Journal of Information Quality|
|Keyword(s)||Data quality, Ensemble learning, Multivariate Gaussian distribution, Quality dimensions, Wikipedia (Extra: Data quality, Ensemble learning, Multivariate Gaussian Distributions, Quality dimension, Wikipedia)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of journal articles|
Ranking Wikipedia article's data quality by learning dimension distributions is a 2014 journal article written in English by Han J., Chen K. and published in International Journal of Information Quality.
As the largest free user-generated knowledge repository, data quality of Wikipedia has attracted great attention these years. Automatic assessment of Wikipedia article's data quality is a pressing concern. We observe that every Wikipedia quality class exhibits its specific characteristic along different first-class quality dimensions including accuracy, completeness, consistency and minimality. We propose to extract quality dimension values from article's content and editing history using dynamic Bayesian network (DBN) and information extraction techniques. Next, we employ multivariate Gaussian distributions to model quality dimension distributions for each quality class, and combine multiple trained classifiers to predict an article's quality class, which can distinguish different quality classes effectively and robustly. Experiments demonstrate that our approach generates a good performance. Copyright
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.