STAIRS: Towards efficient full-text filtering and dissemination in DHT environments
|STAIRS: Towards efficient full-text filtering and dissemination in DHT environments|
|Author(s)||Rao W., Chen L., Fu A.W.-C.|
|Published in||VLDB Journal|
|Keyword(s)||Content dissemination, Content filtering, DHT (Extra: Content dissemination, Content filtering, DHT, Distributed environments, Distributed hash tables, Full-text documents, High costs, Home nodes, Hop count, Key feature, Novel techniques, Peer to peer, Query logs, Real data sets, RSS feeds, Web searches, Wikipedia, Websites, Peer to peer networks)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of journal articles|
Nowadays "live" content, such as weblog, wikipedia, and news, is ubiquitous in the Internet. Providing users with relevant content in a timely manner becomes a challenging problem. Differing from Web search technologies and RSS feeds/reader applications, this paper envisions a personalized full-text content filtering and dissemination system in a highly distributed environment such as a Distributed Hash Table (DHT) based Peer-to-Peer (P2P) Network. Users subscribe to their interested content by specifying input keywords and thresholds as filters. Then, content is disseminated to those users having interest in it. In the literature, full-text document publishing in DHTs has suffered for a long time from the high cost of forwarding a document to home nodes of all distinct terms. It is aggravated by the fact that a document contains a large number of distinct terms (typically tens or thousands of terms per document). In this paper, we propose a set of novel techniques to overcome such a high forwarding cost by carefully selecting a very small number of meaningful terms (or key features) among candidate terms inside each document. Next, to reduce the average hop count per forwarding, we further prune irrelevant documents during the forwarding path. Experiments based on two real query logs and two real data sets demonstrate the effectiveness of our solution.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 7 time(s)