Compact full-text indexing of versioned document collections

From WikiPapers
Revision as of 06:16, November 7, 2014 by Nemo bis (Talk | contribs) (CSV import from another resource for wiki stuff; all data is PD-ineligible, abstracts quoted under quotation right. Skipping when title already exists. Sorry for authors and references to be postprocessed, please edit and create redirects.)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Compact full-text indexing of versioned document collections is a 2009 conference paper written in English by He J., Yan H., Suel T. and published in International Conference on Information and Knowledge Management, Proceedings.

[edit] Abstract

We study the problem of creating highly compressed full-text index structures for versioned document collections, that is, collections that contain multiple versions of each document. Important examples of such collections are Wikipedia or the web page archive maintained by the Internet Archive. A straightforward indexing approach would simply treat each document version as a separate document, such that index size scales linearly with the number of versions. However, several authors have recently studied approaches that exploit the significant similarities between different versions of the same document to obtain much smaller index sizes. In this paper, we propose new techniques for organizing and compressing inverted index structures for such collections. We also perform a detailed experimental comparison of new techniques and the existing techniques in the literature. Our results on an archive of the English version of Wikipedia, and on a subset of the Internet Archive collection, show significant benefits over previous approaches. Copyright 2009 ACM.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 6 time(s)