Gjergji Kasneci

From WikiPapers
Jump to: navigation, search

Gjergji Kasneci is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Bootstrapping Wikipedia to answer ambiguous person name queries Proceedings - International Conference on Data Engineering English 2014 Some of the main ranking features of today's search engines reflect result popularity and are based on ranking models, such as PageRank, implicit feedback aggregation, and more. While such features yield satisfactory results for a wide range of queries, they aggravate the problem of search for ambiguous entities: Searching for a person yields satisfactory results only if the person in question is represented by a high-ranked Web page and all required information are contained in this page. Otherwise, the user has to either reformulate/refine the query or manually inspect low-ranked results to find the person in question. A possible approach to solve this problem is to cluster the results, so that each cluster represents one of the persons occurring in the answer set. However clustering search results has proven to be a difficult endeavor by itself, where the clusters are typically of moderate quality. A wealth of useful information about persons occurs in Web 2.0 platforms, such as Wikipedia, LinkedIn, Facebook, etc. Being human-generated, the information on these platforms is clean, focused, and already disambiguated. We show that when searching with ambiguous person names the information from Wikipedia can be bootstrapped to group the results according to the individuals occurring in them. We have evaluated our methods on a hand-labeled dataset of around 5,000 Web pages retrieved from Google queries on 50 ambiguous person names. 0 0
NAGA: Harvesting, searching and ranking knowledge Entities
Ranking
Relationships
Semantic search
User interface
Proceedings of the ACM SIGMOD International Conference on Management of Data English 2008 The presence of encyclopedic Web sources, such as Wikipedia, the Internet Movie Database (IMDB), World Factbook, etc. calls for new querying techniques that are simple and yet more expressive than those provided by standard keyword-based search engines. Searching for explicit knowledge needs to consider inherent semantic structures involving entities and relationships. In this demonstration proposal, we describe a semantic search system named NAGA. NAGA operates on a knowledge graph, which contains millions of entities and relationships derived from various encyclopedic Web sources, such as the ones above. NAGA's graph-based query language is geared towards expressing queries with additional semantic information. Its scoring model is based on the principles of generative language models, and formalizes several desiderata such as confidence, informativeness and compactness of answers. We propose a demonstration of NAGA which will allow users to browse the knowledge base through a user interface, enter queries in NAGA's query language and tune the ranking parameters to test various ranking aspects. 0 0
The YAGO-NAGA approach to knowledge discovery SIGMOD Record 2008 This paper gives an overview on the {YAGO-NAGA} approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. {YAGO} harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of {WordNet} in order to ensure that all entities have proper classes and the class system is consistent. Currently, the {YAGO} knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the {YAGO} extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for {YAGO,} coined {NAGA.} It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources. 0 0
YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia Proceedings of the 16th international conference on World Wide Web 2007 We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE). The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information extraction techniques. 0 0
YAWN: A Semantically Annotated Wikipedia XML Corpus BTW2007 2007 The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries. 0 0
YAWN: A semantically annotated Wikipedia XML corpus Datenbanksysteme in Business, Technologie und Web, BTW 2007 - 12th Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings 2007 The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries. 0 0
Yago: A core of semantic knowledge Wikipedia
Wordnet
16th International World Wide Web Conference, WWW2007 English 2007 We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE). The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information extraction techniques. 0 0