Olena Medelyan

From WikiPapers
Jump to: navigation, search

Olena Medelyan is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Constructing a focused taxonomy from a document collection Lecture Notes in Computer Science English 2013 We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain. 0 0
"All You Can Eat" Ontology-Building: Feeding Wikipedia to Cyc Cyc
Wikipedia
Ontology
Web mining
WI-IAT English 2009 In order to achieve genuine web intelligence, building some kind of large general machine-readable conceptual scheme (i.e. ontology) seems inescapable. Yet the past 20 years have shown that manual ontology-building is not practicable. The recent explosion of free user-supplied knowledge on the Web has led to great strides in automatic ontology-building, but quality-control is still a major issue. Ideally one should automatically build onto an already intelligent base. We suggest that the long-running Cyc project is able to assist here. We describe methods used to add 35K new concepts mined from Wikipedia to collections in ResearchCyc entirely automatically. Evaluation with 22 human subjects shows high precision both for the new concepts’ categorization, and their assignment as individuals or collections. Most importantly we show how Cyc itself can be leveraged for ontological quality control by ‘feeding’ it assertions one by one, enabling it to reject those that contradict its other knowledge. 0 0
"All you can eat" ontology-building: Feeding wikipedia to Cyc Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009 English 2009 In order to achieve genuine web intelligence, building some kind of large general machine-readable conceptual scheme (i.e. ontology) seems inescapable. Yet the past 20 years have shown that manual ontology-building is not practicable. The recent explosion of free user-supplied knowledge on the Web has led to great strides in automatic ontology-building, but quality-control is still a major issue. Ideally one should automatically build onto an already intelligent base. We suggest that the long-running Cyc project is able to assist here. We describe methods used to add 35K new concepts mined from Wikipedia to collections in ResearchCyc entirely automatically. Evaluation with 22 human subjects shows high precision both for the new concepts' categorization, and their assignment as individuals or collections. Most importantly we show how Cyc itself can be leveraged for ontological quality control by 'feeding' it assertions one by one, enabling it to reject those that contradict its other knowledge. 0 0
Analysis of community structure in Wikipedia English 2009 We present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic of a community can be identified using PageRank. Extracted communities can be organized hierarchically similar to manually created Wikipedia category structure. 0 0
Analysis of community structure in Wikipedia (poster) Community detection
Graph analysis
Wikipedia
WWW'09 - Proceedings of the 18th International World Wide Web Conference English 2009 We present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic of a community can be identified using PageRank. Extracted communities can be organized hierarchically similar to manually created Wikipedia category structure. Copyright is held by the author/owner(s). 0 0
Human-competitive tagging using automatic keyphrase extraction EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 English 2009 This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. Next, we demonstrate how documents can be tagged automatically with a state-of-the-art keyphrase extraction algorithm, and further improve performance in this new domain using a new algorithm, "Maui", that utilizes semantic information extracted from Wikipedia. Maui outperforms existing approaches and extracts tags that are competitive with those assigned by the best performing human taggers. 0 0
Mining meaning from Wikipedia Information extraction
Information retrieval
Natural Language Processing
Ontology
Semantic web
Text mining
Wikipedia
Data mining
Int. J. Hum.-Comput. Stud.
International Journal of Human Computer Studies
English 2009 Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced. 2009 Elsevier Ltd. All rights reserved. 0 4
Augmenting Domain-Specific Thesauri With Knowledge from Wikipedia English 2008 0 1
Augmenting domain-specific thesau with knowledge from wikipedia Thesauri
Wikipedia
Word sense disambiguation
New Zealand Computer Science Research Student Conference, NZCSRSC 2008 - Proceedings English 2008 We propose a new method for extending a domain-specific thesaurus with valuable information from Wikipedia. The main obstacle is to disambiguate thesaurus concepts to correct Wikipedia articles. Given the concept name, we first identify candidate mappings by analyzing article titles, their redirects and disambiguation pages. Then, for each candidate, we compute a link-based similarity score to all mappings of context terms related to this concept. The article with the highest score is then used to augment the thesaurus concept. It is the source for the extended gloss, explaining the concept's meaning, synonymous expressions that can be used as additional non-descriptors in the thesaurus, translations of the concept into other languages, and new domain-relevant concepts. Copyright is held by the author/owner(s). 0 0
Integrating Cyc and Wikipedia: Folksonomy Meets Rigorously Defined Common-Sense English 2008 0 0
Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense AAAI Workshop - Technical Report English 2008 Integration of ontologies begins with establishing mappings between their concept entries. We map categories from the largest manually-built ontology, Cyc, onto Wikipedia articles describing corresponding concepts. Our method draws both on Wikipedia's rich but chaotic hyperlink structure and Cyc's carefully defined taxonomic and common-sense knowledge. On 9,333 manual alignments by one person, we achieve an F-measure of 90%; on 100 alignments by six human subjects the average agreement of the method with the subject is close to their agreement with each other. We cover 62.8% of Cyc categories relating to common-sense knowledge and discuss what further information might be added to Cyc given this substantial new alignment. Copyright 0 0
Topic Indexing with Wikipedia WikiAI English 2008 0 2
Mining Domain-Specific Thesauri from Wikipedia: A Case Study English 2006 0 1
Mining Domain-Specific Thesauri from Wikipedia: A case study Datamining information-retrieval semantic text-mining wikipedia ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06) 2006 Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia, a vast, open encyclopedia. In a comparison with a professional thesaurus for agriculture (Agrovoc) we find that Wikipedia contains a substantial proportion of its domain-specific concepts and semantic relations; furthermore it has impressive coverage of a collection of contemporary documents in the domain. Thesauri derived using these techniques are attractive because they capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts. 0 1