Browse wiki

Jump to: navigation, search
Compact full-text indexing of versioned document collections
Abstract We study the problem of creating highly coWe study the problem of creating highly compressed full-text index structures for versioned document collections, that is, collections that contain multiple versions of each document. Important examples of such collections are Wikipedia or the web page archive maintained by the Internet Archive. A straightforward indexing approach would simply treat each document version as a separate document, such that index size scales linearly with the number of versions. However, several authors have recently studied approaches that exploit the significant similarities between different versions of the same document to obtain much smaller index sizes. In this paper, we propose new techniques for organizing and compressing inverted index structures for such collections. We also perform a detailed experimental comparison of new techniques and the existing techniques in the literature. Our results on an archive of the English version of Wikipedia, and on a subset of the Internet Archive collection, show significant benefits over previous approaches. Copyright 2009 ACM.r previous approaches. Copyright 2009 ACM.
Abstractsub We study the problem of creating highly coWe study the problem of creating highly compressed full-text index structures for versioned document collections, that is, collections that contain multiple versions of each document. Important examples of such collections are Wikipedia or the web page archive maintained by the Internet Archive. A straightforward indexing approach would simply treat each document version as a separate document, such that index size scales linearly with the number of versions. However, several authors have recently studied approaches that exploit the significant similarities between different versions of the same document to obtain much smaller index sizes. In this paper, we propose new techniques for organizing and compressing inverted index structures for such collections. We also perform a detailed experimental comparison of new techniques and the existing techniques in the literature. Our results on an archive of the English version of Wikipedia, and on a subset of the Internet Archive collection, show significant benefits over previous approaches. Copyright 2009 ACM.r previous approaches. Copyright 2009 ACM.
Bibtextype inproceedings  +
Doi 10.1145/1645953.1646008  +
Has author He J. + , Yan H. + , Suel T. +
Has extra keyword Document collection + , Document version + , Experimental comparison + , Full-text index + , Internet archive + , Inverted index compression + , Inverted index structures + , Inverted indices + , Size scale + , Text-indexing + , Web archives + , Web page + , Wikipedia + , Indexing (of information) + , Information retrieval + , Internet + , Knowledge management + , Search engine + , World Wide Web +
Has keyword Inverted index + , Inverted index compression + , Search engine + , Versioned documents + , Web archives + , Wikipedia +
Isbn 9781605585123  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 415–424  +
Published in International Conference on Information and Knowledge Management, Proceedings +
Title Compact full-text indexing of versioned document collections +
Type conference paper  +
Year 2009 +
Creation dateThis property is a special property in this wiki. 7 November 2014 06:16:34  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 06:16:34  +
DateThis property is a special property in this wiki. 2009  +
hide properties that link here 
Compact full-text indexing of versioned document collections + Title
 

 

Enter the name of the page to start browsing from.