CE2 - Towards a large scale hybrid search engine with integrated ranking support

CE2 - Towards a large scale hybrid search engine with integrated ranking support is a 2008 conference paper written in English by Wang H., Tran T., Liu C. and published in International Conference on Information and Knowledge Management, Proceedings.

[edit] Abstract

The Web contains a large amount of documents and increasingly, also semantic data in the form of RDF triples. Many of these triples are annotations that are associated with documents. While structured query is the principal mean to retrieve semantic data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these formalisms to query both documents and semantic data can address more complex information needs. In this paper, we present CE2, an integrated solution that leverages mature database and information retrieval technologies to tackle challenges in hybrid search on the large scale. For scalable storage, CE2 integrates database with inverted indices. Hybrid query processing is supported in CE2 through novel algorithms and data structures, which allow for advanced ranking schemes to be integrated more tightly into the process. Experiments conducted on Dbpedia and Wikipedia show that CE2 can provide good performance in terms of both effectiveness and effciency.

[edit] References

