Michael Strube

From WikiPapers
Jump to: navigation, search

Michael Strube is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A latent variable model for discourse-Aware concept and entity disambiguation 14th Conference of the European Chapter of the Association for Computational Linguistics 2014, EACL 2014 English 2014 This paper takes a discourse-oriented perspective for disambiguating common and proper noun mentions with respect to Wikipedia. Our novel approach models the relationship between disambiguation and aspects of cohesion using Markov Logic Networks with latent variables. Considering cohesive aspects consistently improves the disambiguation results on various commonly used data sets. 0 0
Transforming Wikipedia into a large scale multilingual concept network Knowledge acquisition
Knowledge base
Artificial Intelligence English 2013 A knowledge base for real-world language processing applications should consist of a large base of facts and reasoning mechanisms that combine them to induce novel and more complex information. This paper describes an approach to deriving such a large scale and multilingual resource by exploiting several facets of the on-line encyclopedia Wikipedia. We show how we can build upon Wikipedia's existing network of categories and articles to automatically discover new relations and their instances. Working on top of this network allows for added information to influence the network and be propagated throughout it using inference mechanisms that connect different pieces of existing knowledge. We then exploit this gained information to discover new relations that refine some of those found in the previous step. The result is a network containing approximately 3.7 million concepts with lexicalizations in numerous languages and 49+ million relation instances. Intrinsic and extrinsic evaluations show that this is a high quality resource and beneficial to various NLP tasks. © 2012 Elsevier B.V. All rights reserved. 0 0
Jointly disambiguating and clustering concepts and entities with markov logic Word sense disambiguation 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers English 2012 We present a novel approach for jointly disambiguating and clustering known and unknown concepts and entities with Markov Logic. Concept and entity disambiguation is the task of identifying the correct concept or entity in a knowledge base for a single- or multi-word noun (mention) given its context. Concept and entity clustering is the task of clustering mentions so that all mentions in one cluster refer to the same concept or entity. The proposed model (1) is global, i.e. a group of mentions in a text is disambiguated in one single step combining various global and local features, and (2) performs disambiguation, unknown concept and entity detection and clustering jointly. The disambiguation is performed with respect to Wikipedia. The model is trained once on Wikipedia articles and then applied to and evaluated on different data sets originating from news papers, audio transcripts and internet sources. 0 0
Taxonomy induction based on a collaboratively built knowledge repository English June 2011 0 0
CoSyne: A framework for multilingual content synchronization of wikis Recognizing textual entailment
Translation
Wiki
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 Wikis allow a large base of contributors easy access to shared content, and freedom in editing it. One of the side-effects of this freedom was the emergence of parallel and independently evolving versions in a variety of languages, reflecting the multilingual background of the pool of contributors. For the Wiki to properly represent the user-added content, this should be fully available in all its languages. Working on parallel Wikis in several European languages, we investigate the possibility to "synchronize" different language versions of the same document, by: i) pinpointing topically related pieces of information in the different languages, ii) identifying information that is missing or less detailed in one of the two versions, iii) translating this in the appropriate language, iv) inserting it in the appropriate place. Progress along such directions will allow users to share more easily content across language boundaries. 0 0
Extracting world and linguistic knowledge from Wikipedia NAACL-Tutorials English 2009 0 0
Finding hedges by chasing weasels: Hedge detection using Wikipedia tags and shallow linguistic features ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. English 2009 We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features. 0 0
Finding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features ACLShort English 2009 0 0
Acquiring a Taxonomy from the German Wikipedia English 2008 0 0
Decoding Wikipedia Categories for Knowledge Acquisition English 2008 0 0
Decoding Wikipedia categories for knowledge acquisition Proceedings of the National Conference on Artificial Intelligence English 2008 This paper presents an approach to acquire knowledge from Wikipedia categories and the category network. Many Wikipedia categories have complex names which reflect human classification and organizing instances, and thus encode knowledge about class attributes, taxonomic and other semantic relations. We decode the names and refer back to the network to induce relations between concepts in Wikipedia represented through pages or categories. The category structure allows us to propagate a relation detected between constituents of a category name to numerous concept links. The results of the process are evaluated against ResearchCyc and a subset also by human judges. The results support the idea that Wikipedia category names are a rich source of useful and accurate knowledge. Copyright © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 0 0
Distinguishing Between Instances and Classes in the Wikipedia Taxonomy English 2008 0 0
Sentence fusion via dependency graph compression EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL English 2008 We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method outperforms the fusion approach of Barzilay & McKeown (2005) with respect to readability. 0 0
An API for Measuring the Relatedness of Words in Wikipedia Api
Relatedness semantic\ web
Sematic
Wikipedia
Companion Volume to the Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, : 23--30, 2007. 2007 We present an API for computing the semantic relatedness of words in Wikipedia. 0 1
An API for measuring the relatedness of words in Wikipedia ACL English 2007 0 1
Deriving a Large Scale Taxonomy from Wikipedia AAAI'07: Proceedings of the 22nd national conference on Artificial intelligence English 2007 We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexicosyntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets. 2 0
Deriving a large scale taxonomy from Wikipedia Proceedings of the National Conference on Artificial Intelligence English 2007 We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexico-syntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets. Copyright © 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 0 0
Knowledge derived from wikipedia for computing semantic relatedness J. Artif. Int. Res. English 2007 0 3
Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics English 2006 In this paper we present an extension of a machine learning based coreference resolution system which uses features induced from different semantic knowledge sources. These features represent knowledge mined from WordNet and Wikipedia, as well as information about semantic role labels. We show that semantic features indeed improve the performance on different referring expression types such as pronouns and common nouns. 0 0
WikiRelate! Computing Semantic Relatedness Using Wikipedia English 2006 0 5