Browse wiki

Jump to: navigation, search
Caching search engine results over incremental indices
Abstract A Web search engine must update its index A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naïve approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index. To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over Wikipedia documents using a query log from the Yahoo search engine shows that selective invalidation of cached search results can lower the number of unnecessary query evaluations by as much as 30% compared to a baseline scheme, while returning results of similar freshness. In general, our predictors enable fewer unnecessary invalidations and fewer stale results compared to a TTL-only scheme for similar freshness of results.y scheme for similar freshness of results.
Abstractsub A Web search engine must update its index A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naïve approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index. To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over Wikipedia documents using a query log from the Yahoo search engine shows that selective invalidation of cached search results can lower the number of unnecessary query evaluations by as much as 30% compared to a baseline scheme, while returning results of similar freshness. In general, our predictors enable fewer unnecessary invalidations and fewer stale results compared to a TTL-only scheme for similar freshness of results.y scheme for similar freshness of results.
Bibtextype inproceedings  +
Doi 10.1145/1835449.1835466  +
Has author Blanco R. + , Bortnikov E. + , Junqueira F.P. + , Lempel R. + , Telloli L. + , Hugo Zaragoza +
Has extra keyword Cache invalidation + , Critical component + , Index update + , Poor performance + , Query evaluation + , Query logs + , Real-time indexing + , Search engine results + , Search results + , Time-to-live + , Web search engines + , Wikipedia + , Engines + , Indexing (of information) + , Search engine + , Websites + , Information retrieval +
Has keyword Real-time indexing + , Search engine caching +
Isbn 9781605588964  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 82–89  +
Published in SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval +
Title Caching search engine results over incremental indices +
Type conference paper  +
Year 2010 +
Creation dateThis property is a special property in this wiki. 7 November 2014 06:14:39  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 06:14:39  +
DateThis property is a special property in this wiki. 2010  +
hide properties that link here 
Caching search engine results over incremental indices + Title
 

 

Enter the name of the page to start browsing from.