Detection of Text Quality Flaws as a One-class Classification Problem
|Detection of Text Quality Flaws as a One-class Classification Problem|
|Author(s)||Maik Anderka, Benno Stein, Nedim Lipka|
|Published in||20th ACM Conference on Information and Knowledge Management (CIKM 11)|
|Keyword(s)||Information Quality, Wikipedia, Quality Flaw Prediction, One-class Classiﬁcation|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Detection of Text Quality Flaws as a One-class Classification Problem is a 2011 conference paper written in English by Maik Anderka, Benno Stein, Nedim Lipka and published in 20th ACM Conference on Information and Knowledge Management (CIKM 11).
For Web applications that are based on user generated content the detection of text quality ﬂaws is a key concern. Our research contributes to automatic quality ﬂaw detection. In particular, we propose to cast the detection of text quality ﬂaws as a one-class classiﬁcation problem: we are given only positive examples (= texts containing a particular quality ﬂaw) and decide whether or not an unseen text suffers from this ﬂaw. We argue that common binary or multiclass classiﬁcation approaches are ineffective in here, and we underpin our approach by a real-world application: we employ a dedicated one-class learning approach to determine whether a given Wikipedia article suffers from certain quality ﬂaws. Since in the Wikipedia setting the acquisition of sensible test data is quite intricate, we analyze the effects of a biased sample selection. In addition, we illustrate the classiﬁer effectiveness as a function of the ﬂaw distribution in order to cope with the unknown (real-world) ﬂaw-speciﬁc class imbalances. Altogether, provided test data with little noise, four from ten important quality ﬂaws in Wikipedia can be detected with a precision close to 1.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.