Browse wiki

Jump to: navigation, search
Using the past to score the present: Extending term weighting models through Revision History Analysis
Abstract The generative process underlies many infoThe generative process underlies many information retrieval models, notably statistical language models. Yet these models only examine one (current) version of the document, effectively ignoring the actual document generation process. We posit that a considerable amount of information is encoded in the document authoring process, and this information is complementary to the word occurrence statistics upon which most modern retrieval models are based. We propose a new term weighting model, Revision History Analysis (RHA), which uses the revision history of a document (e.g., the edit history of a page in Wikipedia) to redefine term frequency - a key indicator of document topic/relevance for many retrieval models and text processing tasks. We then apply RHA to document ranking by extending two state-of-the-art text retrieval models, namely, BM25 and the generative statistical language model (LM). To the best of our knowledge, our paper is the first attempt to directly incorporate document authoring history into retrieval models. Empirical results show that RHA provides consistent improvements for state-of-the-art retrieval models, using standard retrieval tasks and benchmarks.g standard retrieval tasks and benchmarks.
Abstractsub The generative process underlies many infoThe generative process underlies many information retrieval models, notably statistical language models. Yet these models only examine one (current) version of the document, effectively ignoring the actual document generation process. We posit that a considerable amount of information is encoded in the document authoring process, and this information is complementary to the word occurrence statistics upon which most modern retrieval models are based. We propose a new term weighting model, Revision History Analysis (RHA), which uses the revision history of a document (e.g., the edit history of a page in Wikipedia) to redefine term frequency - a key indicator of document topic/relevance for many retrieval models and text processing tasks. We then apply RHA to document ranking by extending two state-of-the-art text retrieval models, namely, BM25 and the generative statistical language model (LM). To the best of our knowledge, our paper is the first attempt to directly incorporate document authoring history into retrieval models. Empirical results show that RHA provides consistent improvements for state-of-the-art retrieval models, using standard retrieval tasks and benchmarks.g standard retrieval tasks and benchmarks.
Bibtextype inproceedings  +
Doi 10.1145/1871437.1871519  +
Has author Aji A. + , Yafang Wang + , Agichtein E. + , Evgeniy Gabrilovich +
Has extra keyword Amount of information + , Collaboratively generated content + , Document authoring + , Document generation + , Document ranking + , Empirical results + , History analysis + , Information retrieval models + , Key indicator + , Retrieval models + , Statistical language models + , Term Frequency + , Term weighting + , Text retrieval + , Two-state + , Wikipedia + , Computational linguistics + , Knowledge management + , Natural language processing systems + , Optical character recognition + , Search engine + , Text processing + , Word processing + , Information retrieval +
Has keyword Collaboratively generated content + , Retrieval models + , Term weighting +
Isbn 9781450300995  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 629–638  +
Published in International Conference on Information and Knowledge Management, Proceedings +
Title Using the past to score the present: Extending term weighting models through Revision History Analysis +
Type conference paper  +
Year 2010 +
Creation dateThis property is a special property in this wiki. 8 November 2014 07:36:55  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 8 November 2014 07:36:55  +
DateThis property is a special property in this wiki. 2010  +
hide properties that link here 
Using the past to score the present: Extending term weighting models through Revision History Analysis + Title
 

 

Enter the name of the page to start browsing from.