Last modified on February 13, 2012, at 16:30

Nedim Lipka

Revision as of 16:30, February 13, 2012 by Emijrp (Talk | contribs) (Created page with "{{Infobox Author |name=Nedim Lipka }} {{publications by author}}")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Nedim Lipka is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Predicting Quality Flaws in User-generated Content: The Case of Wikipedia User-generated Content Analysis
Information quality
Wikipedia
Quality Flaw Prediction
One-class Classification
35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) English 2012 The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. 0 0
Predicting quality flaws in user-generated content: The case of wikipedia Information quality
One-class classification
Quality flaw prediction
User-generated content analysis
Wikipedia
SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval English 2012 The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. 0 0
Detection of Text Quality Flaws as a One-class Classification Problem Information quality
Wikipedia
Quality Flaw Prediction
One-class Classification
20th ACM Conference on Information and Knowledge Management (CIKM 11) English 2011 For Web applications that are based on user generated content the detection of text quality flaws is a key concern. Our research contributes to automatic quality flaw detection. In particular, we propose to cast the detection of text quality flaws as a one-class classification problem: we are given only positive examples (= texts containing a particular quality flaw) and decide whether or not an unseen text suffers from this flaw. We argue that common binary or multiclass classification approaches are ineffective in here, and we underpin our approach by a real-world application: we employ a dedicated one-class learning approach to determine whether a given Wikipedia article suffers from certain quality flaws. Since in the Wikipedia setting the acquisition of sensible test data is quite intricate, we analyze the effects of a biased sample selection. In addition, we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. Altogether, provided test data with little noise, four from ten important quality flaws in Wikipedia can be detected with a precision close to 1. 0 0
Towards Automatic Quality Assurance in Wikipedia Wikipedia
Information quality
Flaw Detection
Proceedings of the 20th International Conference on World Wide Web 2011 Featured articles in Wikipedia stand for high information quality, and it has been found interesting to researchers to analyze whether and how they can be distinguished from "ordinary" articles. Here we point out that article discrimination falls far short of writer support or automatic quality assurance: Featured articles are not identified, but are made. Following this motto we compile a comprehensive list of information quality flaws in Wikipedia, model them according to the latest state of the art, and devise one-class classification technology for their identification. 0 0
Towards automatic quality assurance in Wikipedia Wikipedia
Information quality
Flaw Detection
20th International Conference on World Wide Web (WWW 11) English 2011 Featured articles in Wikipedia stand for high information quality, and it has been found interesting to researchers to analyze whether and how they can be distinguished from "ordinary" articles. Here we point out that article discrimination falls far short of writer support or automatic quality assurance: Featured articles are not identified, but are made. Following this motto we compile a comprehensive list of information quality flaws in Wikipedia, model them according to the latest state of the art, and devise one-class classification technology for their identification. 0 0
Evaluating cross-language explicit semantic analysis and cross querying Lecture Notes in Computer Science English 2010 This paper describes our participation in the TEL@CLEF task of the CLEF 2009 ad-hoc track. The task is to retrieve items from various multilingual collections of library catalog records, which are relevant to a user's query. Two different strategies are employed: (i) the Cross-Language Explicit Semantic Analysis, CL-ESA, where the library catalog records and the queries are represented in a multilingual concept space that is spanned by aligned Wikipedia articles, and, (ii) a Cross Querying approach, where a query is translated into all target languages using Google Translate and where the obtained rankings are combined. The evaluation shows that both strategies outperform the monolingual baseline and achieve comparable results. Furthermore, inspired by the Generalized Vector Space Model we present a formal definition and an alternative interpretation of the CL-ESA model. This interpretation is interesting for real-world retrieval applications since it reveals how the computational effort for CL-ESA can be shifted from the query phase to a preprocessing phase. 0 0
Identifying featured articles in Wikipedia: Writing style matters Domain transfer
Information quality
Wikipedia
Proceedings of the 19th International Conference on World Wide Web, WWW '10 English 2010 Wikipedia provides an information quality assessment model with criteria for human peer reviewers to identify featured articles. For this classification task "Is an article featured or not?" we present a machine learning approach that exploits an article's character trigram distribution. Our approach differs from existing research in that it aims to writing style rather than evaluating meta features like the edit history. The approach is robust, straightforward to implement, and outperforms existing solutions. We underpin these claims by an experiment design where, among others, the domain transferability is analyzed. The achieved performances in terms of the F-measure for featured articles are 0.964 within a single Wikipedia domain and 0.880 in a domain transfer situation. 0 1
Identifying featured articles in wikipedia: writing style matters Domain transfer
Information quality
Wikipedia
World Wide Web English 2010 0 1