Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia
|Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia|
|Author(s)||Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, Pável Calado|
|Keyword(s)||Machine learning, Quality assessment, SVM, Wikipedia (Extra: Amount of information, Combination method, Free access, Human knowledge, Link analysis, Machine learning techniques, Machine-learning, Quality assessment, Quality indicators, Quality prediction, Web community, Wikipedia, Feature extraction, Learning algorithms, Robot learning, Support vector machines, Digital libraries)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of publications|
Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia is a 2009 publication written in English by Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, Pável Calado.
The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction.
- This section requires expansion. Please, help!
Cited byThis publication has 3 citations. Only those publications available in WikiPapers are shown here:
- GreenWiki: a tool to support users' assessment of the quality of Wikipedia articles
- Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a Wikipédia
- What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s)