An analysis of systematic judging errors in information retrieval

From WikiPapers
Jump to: navigation, search

An analysis of systematic judging errors in information retrieval is a 2012 conference paper written in English by Kazai G., Craswell N., Yilmaz E., Tahaghoghi S.M.M. and published in ACM International Conference Proceeding Series.

[edit] Abstract

Test collections are powerful mechanisms for the evaluation and optimization of information retrieval systems. However, there is reported evidence that experiment outcomes can be affected by changes to the judging guidelines or changes in the judge population. This paper examines such effects in a web search setting, comparing the judgments of four groups of judges: NIST Web Track judges, untrained crowd workers and two groups of trained judges of a commercial search engine. Our goal is to identify systematic judging errors by comparing the labels contributed by the different groups, working under the same or different judging guidelines. In particular, we focus on detecting systematic differences in judging depending on specific characteristics of the queries and URLs. For example, we ask whether a given population of judges, working under a given set of judging guidelines, are more likely to consistently overrate Wikipedia pages than another group judging under the same instructions. Our approach is to identify judging errors with respect to a consensus set, a judged gold set and a set of user clicks. We further demonstrate how such biases can affect the training of retrieval systems.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 1 time(s)