Massimiliano Ciaramita

From WikiPapers
Jump to: navigation, search

Massimiliano Ciaramita is an author.


Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A scalable gibbs sampler for probabilistic entity linking Lecture Notes in Computer Science English 2014 Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset. 0 0
A framework for benchmarking entity-annotation systems Benchmark framework
Entity annotation
WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web English 2013 In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source1. We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators. Copyright is held by the International World Wide Web Conference Committee (IW3C2). 0 0
Learning to tag and tagging to learn: A case study on wikipedia IEEE Intelligent Systems English 2008 Information technology experts suggest that natural language technologies will play an important role in the Web's future. The latest Web developments, such as the huge success of Web 2.0, demonstrate annotated data's significant potential. The problem of semantically annotating Wikipedia inspires a novel method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available. One main approach to tagging for acquiring knowledge from Wikipedia involves self-training that adds automatically annotated data from the target domain to the original training data. Another key approach involves structural correspondence learning, which tries to build a shared feature representation of the data. 0 0
Semantically Annotated Snapshot of the English Wikipedia LREC'08 2008 This paper describes SW1, the first version of a semantically annotated snapshot of the English Wikipedia. In recent years Wikipedia has become a valuable resource for both the Natural Language Processing (NLP) community and the Information Retrieval (IR) community. Although NLP technology for processing Wikipedia already exists, not all researchers and developers have the computational resources to process such a volume of information. Moreover, the use of different versions of Wikipedia processed differently might make it difficult to compare results. The aim of this work is to provide easy access to syntactic and semantic annotations for researchers of both NLP and IR communities by building a reference corpus to homogenize experiments and make results comparable. These resources, a semantically annotated corpus and a “entity containment” derived graph, are licensed under the GNU Free Documentation License and available from 0 1
Ranking very many typed entities on Wikipedia English 2007 0 0