Fabian Suchanek

From WikiPapers
Jump to: navigation, search

Fabian Suchanek is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Semantic full-text search with ESTER: Scalable, easy, fast Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008 English 2008 We present a demo of ESTER, a search engine that combines the ease of use, speed and scalability of full-text search with the powerful semantic capabilities of ontologies. ESTER supports full-text queries, ontological queries and combinations of these, yet its interface is as easy as can be: A standard search field with semantic information provided interactively as one types. ESTER works by reducing all queries to two basic operations: prefix search and join, which can be implemented very efficiently in terms of both processing time and index space. We demonstrate the capabilities of ESTER on a combination of the English Wikipedia with the Yago ontology, with response times below 100 milliseconds for most queries, and an index size of about 4 GB. The system can be run both stand-alone and as a Web application. 0 0
The YAGO-NAGA approach to knowledge discovery SIGMOD Record 2008 This paper gives an overview on the {YAGO-NAGA} approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. {YAGO} harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of {WordNet} in order to ensure that all entities have proper classes and the class system is consistent. Currently, the {YAGO} knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the {YAGO} extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for {YAGO,} coined {NAGA.} It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources. 0 0
ESTER: Efficient search on text, entities, and relations Interactive
Ontology
Proactive
Semantic search
Wikipedia
Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 English 2007 We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER can answer basic SPARQL graph-pattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB. Copyright 2007 ACM. 0 0
YAWN: A Semantically Annotated Wikipedia XML Corpus BTW2007 2007 The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries. 0 0
YAWN: A semantically annotated Wikipedia XML corpus Datenbanksysteme in Business, Technologie und Web, BTW 2007 - 12th Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings 2007 The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries. 0 0