Browse wiki

Jump to: navigation, search
UMass at TREC 2010 web track: Term dependence, spam filtering and quality bias
Abstract Many existing retrieval approaches treat aMany existing retrieval approaches treat all the documents in the collection equally, and do not take into account the content quality of the retrieved documents. In our submissions for TREC 2010 Web Track, we utilize quality-biased ranking methods that are aimed to promote documents that potentially contain high-quality content, and penalize spam and low-quality documents. Our experiments with the ad hoc web topics from TREC 2010 show that features such as the spamminess of the document (as computed by the Waterloo team [6]) and the readability of the document (modeled by the fraction of stopwords in the document) are very important for improving the precision at the top ranks. Promotion of the high-quality Wikipedia pages leads to further retrieval performance improvements. In addition, we found that using Wikipedia as a high-quality document collection for query expansion can ameliorate some of the negative effects of performing pseudo-relevance feedback from a noisy web collection such as ClueWeb09. a noisy web collection such as ClueWeb09.
Abstractsub Many existing retrieval approaches treat aMany existing retrieval approaches treat all the documents in the collection equally, and do not take into account the content quality of the retrieved documents. In our submissions for TREC 2010 Web Track, we utilize quality-biased ranking methods that are aimed to promote documents that potentially contain high-quality content, and penalize spam and low-quality documents. Our experiments with the ad hoc web topics from TREC 2010 show that features such as the spamminess of the document (as computed by the Waterloo team [6]) and the readability of the document (modeled by the fraction of stopwords in the document) are very important for improving the precision at the top ranks. Promotion of the high-quality Wikipedia pages leads to further retrieval performance improvements. In addition, we found that using Wikipedia as a high-quality document collection for query expansion can ameliorate some of the negative effects of performing pseudo-relevance feedback from a noisy web collection such as ClueWeb09. a noisy web collection such as ClueWeb09.
Bibtextype inproceedings  +
Has author Bendersky M. + , Fisher D. + , Croft W.B. +
Has extra keyword Content qualities + , Document collection + , High quality + , Low qualities + , Pseudo relevance feedback + , Query expansion + , Ranking methods + , Retrieval performance + , Retrieved documents + , Spam filtering + , Web collections + , Wikipedia + , Internet + , Information retrieval +
Issn 1048776X  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Published in NIST Special Publication +
Title UMass at TREC 2010 web track: Term dependence, spam filtering and quality bias +
Type conference paper  +
Year 2010 +
Creation dateThis property is a special property in this wiki. 8 November 2014 06:46:32  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without DOI parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 8 November 2014 06:46:32  +
DateThis property is a special property in this wiki. 2010  +
hide properties that link here 
UMass at TREC 2010 web track: Term dependence, spam filtering and quality bias + Title
 

 

Enter the name of the page to start browsing from.