Knowledge base

From WikiPapers
Jump to: navigation, search

Knowledge base is included as keyword or extra keyword in 0 datasets, 0 tools and 117 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Evaluating the helpfulness of linked entities to readers Yamada I.
Ito T.
Usami S.
Takagi S.
Hideaki Takeda
Takefuji Y.
HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media English 2014 When we encounter an interesting entity (e.g., a person's name or a geographic location) while reading text, we typically search and retrieve relevant information about it. Entity linking (EL) is the task of linking entities in a text to the corresponding entries in a knowledge base, such as Wikipedia. Recently, EL has received considerable attention. EL can be used to enhance a user's text reading experience by streamlining the process of retrieving information on entities. Several EL methods have been proposed, though they tend to extract all of the entities in a document including unnecessary ones for users. Excessive linking of entities can be distracting and degrade the user experience. In this paper, we propose a new method for evaluating the helpfulness of linking entities to users. We address this task using supervised machine-learning with a broad set of features. Experimental results show that our method significantly outperforms baseline methods by approximately 5.7%-12% F1. In addition, we propose an application, Linkify, which enables developers to integrate EL easily into their web sites. 0 0
Trust, but verify: Predicting contribution quality for knowledge base construction and curation Tan C.H.
Agichtein E.
Ipeirotis P.
Evgeniy Gabrilovich
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 The largest publicly available knowledge repositories, such as Wikipedia and Freebase, owe their existence and growth to volunteer contributors around the globe. While the majority of contributions are correct, errors can still creep in, due to editors' carelessness, misunderstanding of the schema, malice, or even lack of accepted ground truth. If left undetected, inaccuracies often degrade the experience of users and the performance of applications that rely on these knowledge repositories. We present a new method, CQUAL, for automatically predicting the quality of contributions submitted to a knowledge base. Significantly expanding upon previous work, our method holistically exploits a variety of signals, including the user's domains of expertise as reflected in her prior contribution history, and the historical accuracy rates of different types of facts. In a large-scale human evaluation, our method exhibits precision of 91% at 80% recall. Our model verifies whether a contribution is correct immediately after it is submitted, significantly alleviating the need for post-submission human reviewing. 0 0
Using linked data to mine RDF from Wikipedia's tables Munoz E.
Hogan A.
Mileo A.
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%. 0 0
A generic open world named entity disambiguation approach for tweets Habib M.B.
Van Keulen M.
IC3K 2013; KDIR 2013 - 5th International Conference on Knowledge Discovery and Information Retrieval and KMIS 2013 - 5th International Conference on Knowledge Management and Information Sharing, Proc. English 2013 Social media is a rich source of information. To make use of this information, it is sometimes required to extract and disambiguate named entities. In this paper, we focus on named entity disambiguation (NED) in twitter messages. NED in tweets is challenging in two ways. First, the limited length of Tweet makes it hard to have enough context while many disambiguation techniques depend on it. The second is that many named entities in tweets do not exist in a knowledge base (KB). We share ideas from information retrieval (IR) and NED to propose solutions for both challenges. For the first problem we make use of the gregarious nature of tweets to get enough context needed for disambiguation. For the second problem we look for an alternative home page if there is no Wikipedia page represents the entity. Given a mention, we obtain a list of Wikipedia candidates from YAGO KB in addition to top ranked pages from Google search engine. We use Support Vector Machine (SVM) to rank the candidate pages to find the best representative entities. Experiments conducted on two data sets show better disambiguation results compared with the baselines and a competitor. 0 0
BlueFinder: Recommending wikipedia links using DBpedia properties Torres D.
Hala Skaf-Molli
Pascal Molli
Diaz A.
Proceedings of the 3rd Annual ACM Web Science Conference, WebSci 2013 English 2013 DBpedia knowledge base has been built from data extracted from Wikipedia. However, many existing relations among resources in DBpedia are missing links among articles from Wikipedia. In some cases, adding these links into Wikipedia will enrich Wikipedia content and therefore will enable better navigation. In previous work, we proposed PIA algorithm that predicts the best link to connect two articles in Wikipedia corresponding to those related by a semantic property in DB-pedia and respecting the Wikipedia convention. PIA calculates this link as a path query. After introducing PIA results in Wikipedia, most of them were accepted by the Wikipedia community. However, some were rejected because PIA predicts path queries that are too general. In this paper, we report the BlueFinder collaborative filtering algorithm that fixes PIA miscalculation. It is sensible to the specificity of the resource types. According to the conducted experimentation we found out that BlueFinder is a better solution than PIA because it solves more cases with a better recall. Copyright 2013 ACM. 0 0
Building, maintaining, and using knowledge bases: A report from the trenches Deshpande O.
Lamba D.S.
Tourn M.
Sanmay Das
Subramaniam S.
Rajaraman A.
Harinarayan V.
Doan A.
Proceedings of the ACM SIGMOD International Conference on Management of Data English 2013 A knowledge base (KB) contains a set of concepts, instances, and relationships. Over the past decade, numerous KBs have been built, and used to power a growing array of applications. Despite this flurry of activities, however, surprisingly little has been published about the end-to-end process of building, maintaining, and using such KBs in industry. In this paper we describe such a process. In particular, we describe how we build, update, and curate a large KB at Kosmix, a Bay Area startup, and later at WalmartLabs, a development and research lab of Walmart. We discuss how we use this KB to power a range of applications, including query understanding, Deep Web search, in-context advertising, event monitoring in social media, product search, social gifting, and social mining. Finally, we discuss how the KB team is organized, and the lessons learned. Our goal with this paper is to provide a real-world case study, and to contribute to the emerging direction of building, maintaining, and using knowledge bases for data management applications. Copyright 0 0
Cross lingual entity linking with bilingual topic model Zhang T.
Kang Liu
Jun Zhao
IJCAI International Joint Conference on Artificial Intelligence English 2013 Cross lingual entity linking means linking an entity mention in a background source document in one language with the corresponding real world entity in a knowledge base written in the other language. The key problem is to measure the similarity score between the context of the entity mention and the document of the cand idate entity. This paper presents a general framework for doing cross lingual entity linking by leveraging a large scale and bilingual knowledge base, Wikipedia. We introduce a bilingual topic model that mining bilingual topic from this knowledge base with the assumption that the same Wikipedia concept documents of two different languages share the same semantic topic distribution. The extracted topics have two types of representation, with each type corresponding to one language. Thus both the context of the entity mention and the document of the cand idate entity can be represented in a space using the same semantic topics. We use these topics to do cross lingual entity linking. Experimental results show that the proposed approach can obtain the competitive results compared with the state-of-art approach. 0 0
Determining relation semantics by mapping relation phrases to knowledge base Liu F.
Yuanyuan Liu
Guangyou Zhou
Kang Liu
Jun Zhao
Proceedings - 2nd IAPR Asian Conference on Pattern Recognition, ACPR 2013 English 2013 0 0
Development and evaluation of an ensemble resource linking medications to their indications Wei W.-Q.
Cronin R.M.
Xu H.
Lasko T.A.
Bastarache L.
Denny J.C.
Journal of the American Medical Informatics Association English 2013 Objective: To create a computable MEDication Indication resource (MEDI) to support primary and secondary use of electronic medical records (EMRs). Materials and methods: We processed four public medication resources, RxNorm, Side Effect Resource (SIDER) 2, MedlinePlus, and Wikipedia, to create MEDI. We applied natural language processing and ontology relationships to extract indications for prescribable, single-ingredient medication concepts and all ingredient concepts as defined by RxNorm. Indications were coded as Unified Medical Language System (UMLS) concepts and International Classification of Diseases, 9th edition (ICD9) codes. A total of 689 extracted indications were randomly selected for manual review for accuracy using dual-physician review. We identified a subset of medication-indication pairs that optimizes recall while maintaining high precision. Results: MEDI contains 3112 medications and 63 343 medication-indication pairs. Wikipedia was the largest resource, with 2608 medications and 34 911 pairs. For each resource, estimated precision and recall, respectively, were 94% and 20% for RxNorm, 75% and 33% for MedlinePlus, 67% and 31% for SIDER 2, and 56% and 51% for Wikipedia. The MEDI high-precision subset (MEDI-HPS) includes indications found within either RxNorm or at least two of the three other resources. MEDI-HPS contains 13 304 unique indication pairs regarding 2136 medications. The mean±SD number of indications for each medication in MEDI-HPS is 6.22±6.09. The estimated precision of MEDI-HPS is 92%. Conclusions: MEDI is a publicly available, computable resource that links medications with their indications as represented by concepts and billing codes. MEDI may benefit clinical EMR applications and reuse of EMR data for research. 0 0
Knowledge base population and visualization using an ontology based on semantic roles Siahbani M.
Vadlapudi R.
Whitney M.
Sarkar A.
AKBC 2013 - Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, Co-located with CIKM 2013 English 2013 This paper extracts facts using "micro-reading" of text in contrast to approaches that extract common-sense knowledge using "macro-reading" methods. Our goal is to extract detailed facts about events from natural language using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic role labels in order to create a novel predicate-centric ontology for entities in our knowledge base. This allows users to find uncommon facts easily. To this end, we tightly couple our knowledge base and ontology to an information visualization system that can be used to explore and navigate events extracted from a large natural language text collection. We use our methodology to create a web-based visual browser of history events in Wikipedia. 0 0
NeuroLex.org: An online framework for neuroscience knowledge Larson S.D.
Martone M.E.
Frontiers in Neuroinformatics English 2013 The ability to transmit, organize, and query information digitally has brought with it the challenge of how to best use this power to facilitate scientific inquiry. Today, few information systems are able to provide detailed answers to complex questions about neuroscience that account for multiple spatial scales, and which cross the boundaries of diverse parts of the nervous system such as molecules, cellular parts, cells, circuits, systems and tissues. As a result, investigators still primarily seek answers to their questions in an increasingly densely populated collection of articles in the literature, each of which must be digested individually. If it were easier to search a knowledge base that was structured to answer neuroscience questions, such a system would enable questions to be answered in seconds that would otherwise require hours of literature review. In this article, we describe NeuroLex.org, a wiki-based website and knowledge management system. Its goal is to bring neurobiological knowledge into a framework that allows neuroscientists to review the concepts of neuroscience, with an emphasis on multiscale descriptions of the parts of nervous systems, aggregate their understanding with that of other scientists, link them to data sources and descriptions of important concepts in neuroscience, and expose parts that are still controversial or missing. To date, the site is tracking ~25,000 unique neuroanatomical parts and concepts in neurobiology spanning experimental techniques, behavioral paradigms, anatomical nomenclature, genes, proteins and molecules. Here we show how the structuring of information about these anatomical parts in the nervous system can be reused to answer multiple neuroscience questions, such as displaying all known GABAergic neurons aggregated in NeuroLex or displaying all brain regions that are known within NeuroLex to send axons into the cerebellar cortex. 0 0
Research on measuring semantic correlation based on the Wikipedia hyperlink network Ye F.
Zhang F.
Luo X.
Xu L.
2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Proceedings English 2013 As a free online encyclopedia with a large-scale of knowledge coverage, rich semantic information and quick update speed, Wikipedia brings new ideas to measure semantic correlation. In this paper, we present a new method for measuring the semantic correlation between words by mining rich semantic information that exists in Wikipedia. Unlike the previous methods that calculate semantic relatedness merely based on the page network or the category network, our method not only takes into account the semantic information of the page network, also combines the semantic information of the category network, and it improve the accuracy of the results. Besides, we analyze and evaluate the algorithm by comparing the calculation results with famous knowledge base (e.g., Hownet) and traditional methods based on Wikipedia on the same test set, and prove its superiority. 0 0
The rise of wikidata Vrandecic D. IEEE Intelligent Systems English 2013 Wikipedia was recently enhanced by a knowledge base: Wikidata. Thousands of volunteers who collect facts and their sources help grow and maintain Wikidata. Within only a few months, more than 16 million statements about more than 4 million items have been added to the project, ready to support Wikipedia and to enable and enrich many different types of external applications. 0 0
Towards an automatic creation of localized versions of DBpedia Palmero Aprosio A.
Claudio Giuliano
Lavelli A.
Lecture Notes in Computer Science English 2013 DBpedia is a large-scale knowledge base that exploits Wikipedia as primary data source. The extraction procedure requires to manually map Wikipedia infoboxes into the DBpedia ontology. Thanks to crowdsourcing, a large number of infoboxes has been mapped in the English DBpedia. Consequently, the same procedure has been applied to other languages to create the localized versions of DBpedia. However, the number of accomplished mappings is still small and limited to most frequent infoboxes. Furthermore, mappings need maintenance due to the constant and quick changes of Wikipedia articles. In this paper, we focus on the problem of automatically mapping infobox attributes to properties into the DBpedia ontology for extending the coverage of the existing localized versions or building from scratch versions for languages not covered in the current version. The evaluation has been performed on the Italian mappings. We compared our results with the current mappings on a random sample re-annotated by the authors. We report results comparable to the ones obtained by a human annotator in term of precision, but our approach leads to a significant improvement in recall and speed. Specifically, we mapped 45,978 Wikipedia infobox attributes to DBpedia properties in 14 different languages for which mappings were not yet available. The resource is made available in an open format. 0 0
Transforming Wikipedia into a large scale multilingual concept network Vivi Nastase
Michael Strube
Artificial Intelligence English 2013 A knowledge base for real-world language processing applications should consist of a large base of facts and reasoning mechanisms that combine them to induce novel and more complex information. This paper describes an approach to deriving such a large scale and multilingual resource by exploiting several facets of the on-line encyclopedia Wikipedia. We show how we can build upon Wikipedia's existing network of categories and articles to automatically discover new relations and their instances. Working on top of this network allows for added information to influence the network and be propagated throughout it using inference mechanisms that connect different pieces of existing knowledge. We then exploit this gained information to discover new relations that refine some of those found in the previous step. The result is a network containing approximately 3.7 million concepts with lexicalizations in numerous languages and 49+ million relation instances. Intrinsic and extrinsic evaluations show that this is a high quality resource and beneficial to various NLP tasks. © 2012 Elsevier B.V. All rights reserved. 0 0
Wiki3C: Exploiting wikipedia for context-aware concept categorization Jiang P.
Hou H.
Long Chen
Shun-ling Chen
Conglei Yao
Chenliang Li
Wang M.
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 Wikipedia is an important human generated knowledge base containing over 21 million articles organized by millions of categories. In this paper, we exploit Wikipedia for a new task of text mining: Context-aware Concept Categorization. In the task, we focus on categorizing concepts according to their context. We exploit article link feature and category structure in Wikipedia, followed by introducing Wiki3C, an unsupervised and domain independent concept categorization approach based on context. In the approach, we investigate two strategies to select and filter Wikipedia articles for the category representation. Besides, a probabilistic model is employed to compute the semantic relatedness between two concepts in Wikipedia. Experimental evaluation using manually labeled ground truth shows that our proposed Wiki3C can achieve a noticeable improvement over the baselines without considering contextual information. 0 0
Wisdom in the social crowd: An analysis of Quora Gang Wang
Gill K.
Mohanlal M.
Hua Zheng
Zhao B.Y.
WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web English 2013 Efforts such as Wikipedia have shown the ability of user communities to collect, organize and curate information on the Internet. Recently, a number of question and answer (Q&A) sites have successfully built large growing knowledge repositories, each driven by a wide range of questions and answers from its users community. While sites like Yahoo Answers have stalled and begun to shrink, one site still going strong is Quora, a rapidly growing service that augments a regular Q&A system with social links between users. Despite its success, however, little is known about what drives Quora's growth, and how it continues to connect visitors and experts to the right questions as it grows. In this paper, we present results of a detailed analysis of Quora using measurements. We shed light on the impact of three different connection networks (or graphs) inside Quora, a graph connecting topics to users, a social graph connecting users, and a graph connecting related questions. Our results show that heterogeneity in the user and question graphs are significant contributors to the quality of Quora's knowledge base. One drives the attention and activity of users, and the other directs them to a small set of popular and interesting questions. Copyright is held by the International World Wide Web Conference Committee (IW3C2). 0 0
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia Johannes Hoffart
Suchanek F.M.
Berberich K.
Gerhard Weikum
Artificial Intelligence English 2013 We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 447 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95% of the facts in YAGO2. In this paper, we present the extraction methodology, the integration of the spatio-temporal dimension, and our knowledge representation SPOTL, an extension of the original SPO-triple model to time and space. © 2012 Elsevier B.V. All rights reserved. 0 0
A semantic approach to recommending text advertisements for images Weinan Zhang
Tian L.
Xiaohua Sun
Haofen Wang
Yiqin Yu
RecSys'12 - Proceedings of the 6th ACM Conference on Recommender Systems English 2012 In recent years, more and more images have been uploaded and published on the Web. Along with text Web pages, images have been becoming important media to place relevant advertisements. Visual contextual advertising, a young research area, refers to finding relevant text advertisements for a target image without any textual information (e.g., tags). There are two existing approaches, advertisement search based on image annotation, and more recently, advertisement matching based on feature translation between images and texts. However, the state of the art fails to achieve satisfactory results due to the fact that recommended advertisements are syntactically matched but semantically mismatched. In this paper, we propose a semantic approach to improving the performance of visual contextual advertising. More specifically, we exploit a large high-quality image knowledge base (ImageNet) and a widely-used text knowledge base (Wikipedia) to build a bridge between target images and advertisements. The image-advertisement match is built by mapping images and advertisements into the respective knowledge bases and then finding semantic matches between the two knowledge bases. The experimental results show that semantic match outperforms syntactic match significantly using test images from Flickr. We also show that our approach gives a large improvement of 16.4% on the precision of the top 10 matches over previous work, with more semantically relevant advertisements recommended. Copyright © 2012 by the Association for Computing Machinery, Inc. (ACM). 0 0
Automatic taxonomy extraction in different languages using wikipedia and minimal language-specific information Dominguez Garcia R.
Schmidt S.
Rensing C.
Steinmetz R.
Lecture Notes in Computer Science English 2012 Knowledge bases extracted from Wikipedia are particularly useful for various NLP and Semantic Web applications due to their co- verage, actuality and multilingualism. This has led to many approaches for automatic knowledge base extraction from Wikipedia. Most of these approaches rely on the English Wikipedia as it is the largest Wikipedia version. However, each Wikipedia version contains socio-cultural knowledge, i.e. knowledge with relevance for a specific culture or language. In this work, we describe a method for extracting a large set of hyponymy relations from the Wikipedia category system that can be used to acquire taxonomies in multiple languages. More specifically, we describe a set of 20 features that can be used for for Hyponymy Detection without using additional language-specific corpora. Finally, we evaluate our approach on Wikipedia in five different languages and compare the results with the WordNet taxonomy and a multilingual approach based on interwiki links of the Wikipedia. 0 0
Building a large scale knowledge base from Chinese Wiki Encyclopedia Zhe Wang
Jing-Woei Li
Pan J.Z.
Lecture Notes in Computer Science English 2012 DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search. 0 0
Collective context-aware topic models for entity disambiguation Sen P. WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web English 2012 A crucial step in adding structure to unstructured data is to identify references to entities and disambiguate them. Such disambiguated references can help enhance readability and draw similarities across different pieces of running text in an automated fashion. Previous research has tackled this problem by first forming a catalog of entities from a knowledge base, such as Wikipedia, and then using this catalog to disambiguate references in unseen text. However, most of the previously proposed models either do not use all text in the knowledge base, potentially missing out on discriminative features, or do not exploit word-entity proximity to learn high-quality catalogs. In this work, we propose topic models that keep track of the context of every word in the knowledge base; so that words appearing within the same context as an entity are more likely to be associated with that entity. Thus, our topic models utilize all text present in the knowledge base and help learn high-quality catalogs. Our models also learn groups of co-occurring entities thus enabling collective disambiguation. Unlike most previous topic models, our models are non-parametric and do not require the user to specify the exact number of groups present in the knowledge base. In experiments performed on an extract of Wikipedia containing almost 60,000 references, our models outperform SVM-based baselines by as much as 18% in terms of disambiguation accuracy translating to an increment of almost 11,000 correctly disambiguated references. 0 0
Combining AceWiki with a CAPTCHA system for collaborative knowledge acquisition Nalepa G.J.
Adrian W.T.
Szymon Bobek
Maslanka P.
Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI English 2012 Formalized knowledge representation methods allow to build useful and semantically enriched knowledge bases which can be shared and reasoned upon. Unfortunately, knowledge acquisition for such formalized systems is often a time-consuming and tedious task. The process requires a domain expert to provide terminological knowledge, a knowledge engineer capable of modeling knowledge in a given formalism, and also a great amount of instance data to populate the knowledge base. We propose a CAPTCHA-like system called AceCAPTCHA in which users are asked questions in a controlled natural language. The questions are generated automatically based on a terminology stored in a knowledge base of the system, and the answers provided by users serve as instance data to populate it. The implementation uses AceWiki semantic wiki and a reasoning engine written in Prolog. 0 0
Community optimization: Function optimization by a simulated web community Veenhuis C.B. International Conference on Intelligent Systems Design and Applications, ISDA English 2012 In recent years a number of web-technology supported communities of humans have been developed. Such a web community is able to let emerge a collective intelligence with a higher performance in solving problems than the single members of the community. Based on the successes of collective intelligence systems like Wikipedia, the web encyclopedia, the question arises, whether such a collaborative web community could also be capable of function optimization. This paper introduces an optimization algorithm called Community Optimization (CO), which optimizes a function by simulating a collaborative web community, which edits or improves an article-base, or, more general, a knowledge-base. In order to realize this, CO implements a behavioral model derived from the human behavior that can be observed within certain types of web communities (e.g., Wikipedia or open source communities). The introduced CO method is applied to four well-known benchmark problems. CO significantly outperformed the Fully Informed Particle Swarm Optimization as well as two Differential Evolution approaches in all four cases especially in higher dimensions. 0 0
Cross-lingual knowledge linking across wiki knowledge bases Zhe Wang
Jing-Woei Li
Tang J.
WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web English 2012 Wikipedia becomes one of the largest knowledge bases on the Web. It has attracted 513 million page views per day in January 2012. However, one critical issue for Wikipedia is that articles in different language are very unbalanced. For example, the number of articles on Wikipedia in English has reached 3.8 million, while the number of Chinese articles is still less than half million and there are only 217 thousand cross-lingual links between articles of the two languages. On the other hand, there are more than 3.9 million Chinese Wiki articles on Baidu Baike and Hudong.com, two popular encyclopedias in Chinese. One important question is how to link the knowledge entries distributed in different knowledge bases. This will immensely enrich the information in the online knowledge bases and benefit many applications. In this paper, we study the problem of cross-lingual knowledge linking and present a linkage factor graph model. Features are defined according to some interesting observations. Experiments on the Wikipedia data set show that our approach can achieve a high precision of 85.8% with a recall of 88.1%. The approach found 202,141 new cross-lingual links between English Wikipedia and Baidu Baike. 0 0
DBpedia ontology enrichment for inconsistency detection Topper G.
Knuth M.
Sack H.
ACM International Conference Proceeding Series English 2012 In recent years the Web of Data experiences an extraordinary development: an increasing amount of Linked Data is available on the World Wide Web (WWW) and new use cases are emerging continually. However, the provided data is only valuable if it is accurate and without contradictions. One essential part of the Web of Data is DBpedia, which covers the structured data of Wikipedia. Due to its automatic extraction based on Wikipedia resources that have been created by various contributors, DBpedia data often is error-prone. In order to enable the detection of inconsistencies this work focuses on the enrichment of the DBpedia ontology by statistical methods. Taken the enriched ontology as a basis the process of the extraction of Wikipedia data is adapted, in a way that inconsistencies are detected during the extraction. The creation of suitable correction suggestions should encourage users to solve existing errors and thus create a knowledge base of higher quality. Copyright 2012 ACM. 0 0
Exploiting a web-based encyclopedia as a knowledge base for the extraction of multilingual terminology Sadat F. Lecture Notes in Computer Science English 2012 Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopaedias such as Wikipedia as comparable corpora for bilingual terminology extraction. We propose an approach to extract terms and their translations from different types of Wikipedia link information and data. The next step will be using linguistic-based information to re-rank and filter the extracted term candidates in the target language. Preliminary evaluations using the combined statistics-based and linguistic-based approaches were applied on different pairs of languages including Japanese, French and English. These evaluations showed a real open improvement and a good quality of the extracted term candidates for building or enriching multilingual anthologies, dictionaries or feeding a cross-language information retrieval system with the related expansion terms of the source query. 0 0
Exploiting the Wikipedia structure in local and global classification of taxonomic relations Do Q.X.
Dan Roth
Natural Language Engineering English 2012 Determining whether two terms have an ancestor relation (e.g. Toyota Camry and car) or a sibling relation (e.g. Toyota and Honda) is an essential component of textual inference in Natural Language Processing applications such as Question Answering, Summarization, and Textual Entailment. Significant work has been done on developing knowledge sources that could support these tasks, but these resources usually suffer from low coverage, noise, and are inflexible when dealing with ambiguous and general terms that may not appear in any stationary resource, making their use as general purpose background knowledge resources difficult. In this paper, rather than building a hierarchical structure of concepts and relations, we describe an algorithmic approach that, given two terms, determines the taxonomic relation between them using a machine learning-based approach that makes use of existing resources. Moreover, we develop a global constraint-based inference process that leverages an existing knowledge base to enforce relational constraints among terms and thus improves the classifier predictions. Our experimental evaluation shows that our approach significantly outperforms other systems built upon the existing well-known knowledge sources. 0 0
Exploring appropriation of enterprise wikis: A multiple-case study Stocker A.
Richter A.
Hoefler P.
Tochtermann K.
Computer-Supported Cooperative Work English 2012 The purpose of this paper is to provide both application-oriented researchers and practitioners with detailed insights into conception, implementation, and utilization of intraorganizational wikis to support knowledge management and group work. Firstly, we report on three case studies and describe how wikis have been appropriated in the context of a concrete practice. Our study reveals that the wikis have been used as Knowledge Base, Encyclopedia and Support Base, respectively.We present the identified practices as a result of the wiki appropriation process and argue that due to their open and flexible nature these wikis have been appropriated according to the users' needs. Our contribution helps to understand how platforms support working practices that have not been supported by groupware before, or at least not in the same way. Secondly, three detailed implementation reports uncover many aspects of wiki projects, e.g., different viewpoints of managers and users, an investigation of other sources containing business-relevant information, and perceived obstacles to wiki projects. In this context, our study generates a series of lessons learned for people who intend to implement wikis in their own organizations, including the awareness of usage potential, the need for additional managerial support, and clear communication strategies to promote wiki usage. 0 0
Extraction of temporal facts and events from Wikipedia Kuzey E.
Gerhard Weikum
ACM International Conference Proceeding Series English 2012 Recently, large-scale knowledge bases have been constructed by automatically extracting relational facts from text. Unfortunately, most of the current knowledge bases focus on static facts and ignore the temporal dimension. However, the vast majority of facts are evolving with time or are valid only during a particular time period. Thus, time is a significant dimension that should be included in knowledge bases. In this paper, we introduce a complete information extraction framework that harvests temporal facts and events from semi-structured data and free text of Wikipedia articles to create a temporal ontology. First, we extend a temporal data representation model by making it aware of events. Second, we develop an information extraction method which harvests temporal facts and events from Wikipedia infoboxes, categories, lists, and article titles in order to build a temporal knowledge base. Third, we show how the system can use its extracted knowledge for further growing the knowledge base. We demonstrate the effectiveness of our proposed methods through several experiments. We extracted more than one million temporal facts with precision over 90% for extraction from semi-structured data and almost 70% for extraction from text. 0 0
Flow over periodic hills - Test case for ERCOFTAC knowledge base wiki Breuer M.
Rapp C.
Manhart M.
ECCOMAS 2012 - European Congress on Computational Methods in Applied Sciences and Engineering, e-Book Full Papers English 2012 The European Research Community on Flow, Turbulence and Combustion (ERCOFTAC) in collaboration with the EU-Network Project QNET-CFD have established a knowledge base wiki accessible via http://qnet-ercoftac.cfms.org. uk in order to provide reliable CFD test cases and corresponding guidelines for solving the individual flow problems. For establishing quality and trust in CFD, experimental and numerical data on a wide range of flows are placed at the disposal of the fluid mechanics community. It includes application challenges for individual industrial sectors and more generic flows. For the latter detailed experimental data as well as highly resolved numerical simulations are available. The objective of the present contribution is to report on a new generic test case which has been recently added to the data base. The results presented are based on the one hand on experimental investigations using particle image velocimetry and on the other hand on numerical predictions relying on direct numerical simulations and large-eddy simulations with two independent codes. After presenting some of the data available, hints and guidelines for the appropriate prediction of this flow case are given. 0 0
Jointly disambiguating and clustering concepts and entities with markov logic Angela Fahrni
Michael Strube
24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers English 2012 We present a novel approach for jointly disambiguating and clustering known and unknown concepts and entities with Markov Logic. Concept and entity disambiguation is the task of identifying the correct concept or entity in a knowledge base for a single- or multi-word noun (mention) given its context. Concept and entity clustering is the task of clustering mentions so that all mentions in one cluster refer to the same concept or entity. The proposed model (1) is global, i.e. a group of mentions in a text is disambiguated in one single step combining various global and local features, and (2) performs disambiguation, unknown concept and entity detection and clustering jointly. The disambiguation is performed with respect to Wikipedia. The model is trained once on Wikipedia articles and then applied to and evaluated on different data sets originating from news papers, audio transcripts and internet sources. 0 0
LINDEN: Linking named entities with knowledge base via semantic knowledge Shen W.
Wang J.
Luo P.
Wang M.
WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web English 2012 Integrating the extracted facts with an existing knowledge base has raised an urgent need to address the problem of entity linking. Specifically, entity linking is the task to link the entity mention in text with the corresponding real world entity in the existing knowledge base. However, this task is challenging due to name ambiguity, textual inconsistency, and lack of world knowledge in the knowledge base. Several methods have been proposed to tackle this problem, but they are largely based on the co-occurrence statistics of terms between the text around the entity mention and the document associated with the entity. In this paper, we propose LINDEN1, a novel framework to link named entities in text with a knowledge base unifying Wikipedia and Word-Net, by leveraging the rich semantic knowledge embedded in the Wikipedia and the taxonomy of the knowledge base. We extensively evaluate the performance of our proposed LINDEN over two public data sets and empirical results show that LINDEN significantly outperforms the state-of-the-art methods in terms of accuracy. 0 0
Mining Wikipedia's snippets graph: First step to build a new knowledge base Wira-Alam A.
Mathiak B.
CEUR Workshop Proceedings English 2012 In this paper, we discuss the aspects of mining links and text snippets from Wikipedia as a new knowledge base. Current knowledge base, e.g. DBPedia[1], covers mainly the structured part of Wikipedia, but not the content as a whole. Acting as a complement, we focus on extracting information from the text of the articles. We extract a database of the hyperlinks between Wikipedia articles and populate them with the textual context surrounding each hyperlink. This would be useful for network analysis, e.g. to measure the influence of one topic on another, or for question-answering directly (for stating the relationship between two entities). First, we describe the technical parts related to extracting the data from Wikipedia. Second, we specify how to represent the data extracted as an extended triple through a Web service. Finally, we discuss the usage possibilities upon our expectation and also the challenges. 0 0
Mining spatio-temporal patterns in the presence of concept hierarchies Anh L.V.Q.
Gertz M.
Proceedings - 12th IEEE International Conference on Data Mining Workshops, ICDMW 2012 English 2012 In the past, approaches to mining spatial and spatio-temporal data for interesting patterns have mainly concentrated on data obtained through observations and simulations where positions of objects, such as areas, vehicles, or persons, are collected over time. In the past couple of years, however, new datasets have been built by automatically extracting facts, as subject-predicate-object triples, from semistructured information sources such as Wikipedia. Recently some approaches, for example, in the context of YAGO2, have extended such facts by adding temporal and spatial information. The presence of such new data sources gives rise to new approaches for discovering spatio-temporal patterns. In this paper, we present a framework in support of the discovery of interesting spatio-temporal patterns from knowledge base datasets. Different from traditional approaches to mining spatio-temporal data, we focus on mining patterns at different levels of granularity by exploiting concept hierarchies, which are a key ingredient in knowledge bases.We introduce a pattern specification language and outline an algorithmic approach to efficiently determine complex patterns. We demonstrate the utility of our framework using two different real-world datasets from YAGO2 and the Website eventful.com. 0 0
NAMED ENTITY DISAMBIGUATION: A HYBRID APPROACH Nguyen H.T.
Cao T.H.
International Journal of Computational Intelligence Systems English 2012 Semantic annotation of named entities for enriching unstructured content is a critical step in development of Semantic Web and many Natural Language Processing applications. To this end, this paper addresses the named entity disambiguation problem that aims at detecting entity mentions in a text and then linking them to entries in a knowledge base. In this paper, we propose a hybrid method, combining heuristics and statistics, for named entity disambiguation. The novelty is that the disambiguation process is incremental and includes several rounds that filter the candidate referents, by exploiting previously identified entities and extending the text by those entity attributes every time they are successfully resolved in a round. Experiments are conducted to evaluate and show the advantages of the proposed method. The experiment results show that our approach achieves high accuracy and can be used to construct a robust entity disambiguation system. 0 0
REWOrD: Semantic relatedness in the web of data Pirro G. Proceedings of the National Conference on Artificial Intelligence English 2012 This paper presents REWOrD, an approach to compute semantic relatedness between entities in the Web of Data representing real word concepts. REWOrD exploits the graph nature of RDF data and the SPARQL query language to access this data. Through simple queries, REWOrD constructs weighted vectors keeping the informativeness of RDF predicates used to make statements about the entities being compared. The most informative path is also considered to further refine informativeness. Relatedness is then computed by the cosine of the weighted vectors. Differently from previous approaches based on Wikipedia, REWOrD does not require any preprocessing or custom data transformation. Indeed, it can leverage whatever RDF knowledge base as a source of background knowledge. We evaluated REWOrD in different settings by using a new dataset of real word entities and investigate its flexibility. As compared to related work on classical datasets, REWOrD obtains comparable results while, on one side, it avoids the burden of preprocessing and data transformation and, on the other side, it provides more flexibility and applicability in a broad range of domains. Copyright © 2012, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Tapping into knowledge base for concept feedback: Leveraging ConceptNet to improve search results for difficult queries Kotov A.
Zhai C.X.
WSDM 2012 - Proceedings of the 5th ACM International Conference on Web Search and Data Mining English 2012 Query expansion is an important and commonly used technique for improving Web search results. Existing methods for query expansion have mostly relied on global or local analysis of document collection, click-through data, or simple ontologies such as WordNet. In this paper, we present the results of a systematic study of the methods leveraging the ConceptNet knowledge base, an emerging new Web resource, for query expansion. Specifically, we focus on the methods leveraging ConceptNet to improve the search results for poorly performing (or difficult) queries. Unlike other lexico-semantic resources, such as WordNet and Wikipedia, which have been extensively studied in the past, ConceptNet features a graph-based representation model of commonsense knowledge, in which the terms are conceptually related through rich relational ontology. Such representation structure enables complex, multi-step inferences between the concepts, which can be applied to query expansion. We first demonstrate through simulation experiments that expanding queries with the related concepts from ConceptNet has great potential for improving the search results for difficult queries. We then propose and study several supervised and unsupervised methods for selecting the concepts from ConceptNet for automatic query expansion. The experimental results on multiple data sets indicate that the proposed methods can effectively leverage ConceptNet to improve the retrieval performance of difficult queries both when used in isolation as well as in combination with pseudo-relevance feedback. Copyright 2012 ACM. 0 0
The role of AI in wisdom of the crowds for the social construction of knowledge on sustainability Maher M.L.
Fisher D.H.
AAAI Spring Symposium - Technical Report English 2012 One of the original applications of crowdsourcing the construction of knowledge is Wikipedia, which relies entirely on people to contribute, extend, and modify the representation of knowledge. This paper presents a case for combining AI and wisdom of the crowds for the social construction of knowledge. Our social-computational approach to collective intelligence combines the strengths of human cognitive diversity in producing content and the capabilities of an AI, through methods such as topic modeling, to link and synthesize across these human contributions. In addition to drawing from established domains such as Wikipedia for inspiration and guidance, we present the design of a system that incorporates AI into wisdom of the crowds to develop a knowledge base on sustainability. In this setting the AI plays the role of scholar, as might many of the other participants, drawing connections and synthesizing across contributions. We close with a general discussion, speculating on educational implications and other roles that an AI can play within an otherwise collective human intelligence. Copyright © 2012, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Twitter user modeling and tweets recommendation based on wikipedia concept graph ChengLang Lu
Lam W.
YanChun Zhang
AAAI Workshop - Technical Report English 2012 As a microblogging service, Twitter is playing a more and more important role in our life. Users follow various accounts, such as friends or celebrities, to get the most recent information. However, as one follows more and more people, he/she may be overwhelmed by the huge amount of status updates. Twitter messages are only displayed by time recency, which means if one cannot read all messages, he/she may miss some important or interesting tweets. In this paper, we propose to re-rank tweets in user's timeline, by constructing a user profile based on user's previous tweets and measuring the relevance between a tweet and user interest. The user interest profile is represented as concepts from Wikipedia, which is quite a large and inter-linked online knowledge base. We make use of Explicit Semantic Analysis algorithm to extract related concepts from tweets, and then expand user's profile by random walk on Wikipedia concept graph, utilizing the inter-links between Wikipedia articles. Our experiments show that our model is effective and efficient to recommend tweets to users. Copyright © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 0 0
Vulnerapedia: Security knowledge management with an ontology Blanco F.J.
Fernandez-Villamor J.I.
Iglesias C.A.
ICAART 2012 - Proceedings of the 4th International Conference on Agents and Artificial Intelligence English 2012 Ontological engineering can do an efficient management of the security data, generating security knowledge. We use a step methodology defining a main ontology in the web application security domain. Next, extraction and integration processes translate unstructured data in quality security knowledge. Thus, we check the ontology can perform management processes involved. A social tool is implemented to wrap the knowledge in an accessible way. It opens the security knowledge to encourage people to collaboratively use and extend it. 0 0
A category-driven approach to deriving domain specific subset of Wikipedia Korshunov A.
Denis Turdakov
Jeong J.
Lee M.
Moon C.
CEUR Workshop Proceedings English 2011 While many researchers attempt to build up different kinds of ontologies by means of Wikipedia, the possibility of deriving high-quality domain specific subset of Wikipedia using its own category structure still remains undervalued. We prove the necessity of such processing in this paper and also propose an appropriate technique. As a result, the size of knowledge base for our text processing framework has been reduced by more than order, while the precision of disambiguating musical metadata (ID3 tags) has decreased from 98% to 64%. 0 0
A framework for integrating DBpedia in a multi-modality ontology news image retrieval system Khalid Y.I.A.
Noah S.A.
2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011 English 2011 Knowledge sharing communities like Wikipedia and automated extraction like DBpedia enable a large construction of machine processing knowledge bases with relational fact of entities. These options give a great opportunity for researcher to use it as a domain concept between low-level features and high-level concepts for image retrieval. The collection of images attached to entities, such as on-line news articles with images, are abundant on the Internet. Still, it is difficult to retrieve accurate information on these entities. Using entity names in a search engine yields large lists, but often results in imprecise and unsatisfactory outcomes. Our goal is to populate a knowledge base with on-line image news resources in the BBC sport domain. This system will yield high precision, a high recall and include diverse sports photos for specific entities. A multi-modality ontology retrieval system, with relational facts about entities for generating expanded queries, will be used to retrieve results. DBpedia will be used as a domain sport ontology description, and will be integrated with a textual description and a visual description, both generated by hand. To overcome semantic interoperability between ontologies, automated ontology alignment is used. In addition, visual similarity measures based on MPEG7 descriptions and SIFT features, are used for higher diversity in the final rankings. 0 0
Aging-kb: A knowledge base for the study of the aging process Becker K.G.
Holmes K.A.
YanChun Zhang
Mechanisms of Ageing and Development English 2011 As the science of the aging process moves forward, a recurring challenge is the integration of multiple types of data and information with classical aging theory while disseminating that information to the scientific community. Here we present AGING-kb, a public knowledge base with the goal of conceptualizing and presenting fundamental aspects of the study of the aging process. Aging-kb has two interconnected parts, the Aging-kb tree and the Aging Wiki. The Aging-kb tree is a simple intuitive dynamic tree hierarchy of terms describing the field of aging from the general to the specific. This enables the user to see relationships between areas of aging research in a logical comparative fashion. The second part is a specialized Aging Wiki which allows expert definition, description, supporting information, and documentation of each aging keyword term found in the Aging-kb tree. The Aging Wiki allows community participation in describing and defining concepts and terms in the Wiki format. This aging knowledge base provides a simple intuitive interface to the complexities of aging. 0 0
Beyond the bag-of-words paradigm to enhance information retrieval applications Paolo Ferragina Proceedings - 4th International Conference on SImilarity Search and APplications, SISAP 2011 English 2011 The typical IR-approach to indexing, clustering, classification and retrieval, just to name a few, is the one based on the bag-of-words paradigm. It eventually transforms a text into an array of terms, possibly weighted (with tf-idf scores or derivatives), and then represents that array via points in highly-dimensional space. It is therefore syntactical and unstructured, in the sense that different terms lead to different dimensions. Co-occurrence detection and other processing steps have been thus proposed (see e.g. LSI, Spectral analysis [7]) to identify the existence of those relations, but yet everyone is aware of the limitations of this approach especially in the expanding context of short (and thus poorly composed) texts, such as the snippets of search-engine results, the tweets of a Twitter channel, the items of a news feed, the posts of a blog, or the advertisement messages, etc.. A good deal of recent work is attempting to go beyond this paradigm by enriching the input text with additional structured annotations. This general idea has been declined in the literature in two distinct ways. One consists of extending the classic term-based vector-space model with additional dimensions corresponding to features (concepts) extracted from an external knowledge base, such as DMOZ, Wikipedia, or even the whole Web (see e.g. [4, 5, 12]). The pro of this approach is to extend the bag-of-words scheme with more concepts, thus possibly allowing the identification of related texts which are syntactically far apart. The cons resides in the contamination of these vectors by un-related (but common) concepts retrieved via the syntactic queries. The second way consists of identifying in the input text short-and-meaningful sequences of terms (aka spots) which are then connected to unambiguous concepts drawn from a catalog. The catalog can be formed by either a small set of specifically recognized types, most often People and Locations (aka Named Entities, see e.g. [13, 14]), or it can consists of millions of concepts drawn from a large knowledge base, such as Wikipedia. This latter catalog is ever-expanding and currently offers the best trade-off between a catalog with a rigorous structure but with low coverage (like WordNet, CYC, TAP), and a large text collection with wide coverage but unstructured and noised content (like the whole Web). To understand how this annotation works, let us consider the following short news: "Diego Maradona won against Mexico". The goal of the annotation is to detect "Diego Maradona" and"Mexico" as spots, and then hyper-link them with theWikipedia pages which deal with the ex Argentina's coach and the football team of Mexico. The annotator uses as spots the anchor texts which occur in Wikipedia pages, and as possible concepts for each spot the (possibly many) pages pointed in Wikipedia by that spot/anchor 0 0
Capability modeling of knowledge-based agents for commonsense knowledge integration Kuo Y.-L.
Hsu J.Y.-J.
Lecture Notes in Computer Science English 2011 Robust intelligent systems require commonsense knowledge. While significant progress has been made in building large commonsense knowledge bases, they are intrinsically incomplete. It is difficult to combine multiple knowledge bases due to their different choices of representation and inference mechanisms, thereby limiting users to one knowledge base and its reasonable methods for any specific task. This paper presents a multi-agent framework for commonsense knowledge integration, and proposes an approach to capability modeling of knowledge bases without a common ontology. The proposed capability model provides a general description of large heterogeneous knowledge bases, such that contents accessible by the knowledge-based agents may be matched up against specific requests. The concept correlation matrix of a knowledge base is transformed into a k-dimensional vector space using low-rank approximation for dimensionality reduction. Experiments are performed with the matchmaking mechanism for commonsense knowledge integration framework using the capability models of ConceptNet, WordNet, and Wikipedia. In the user study, the matchmaking results are compared with the ranked lists produced by online users to show that over 85% of them are accurate and have positive correlation with the user-produced ranked lists. 0 0
Creating and Exploiting a Hybrid Knowledge Base for Linked Data Zareen Syed
Tim Finin
Communications in Computer and Information Science English 2011 Twenty years ago Tim Berners-Lee proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer programs to share and understand structured and semi-structured information easily. We will review the evolution of the idea and technologies to realize a Web of Data and describe how we are exploiting them to enhance information retrieval and information extraction. A key resource in our work is Wikitology, a hybrid knowledge base of structured and unstructured information extracted from Wikipedia. 0 0
Einstein: Physicist or vegetarian? Summarizing semantic type graphs for knowledge discovery Tylenda T.
Sozio M.
Gerhard Weikum
Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 English 2011 The Web and, in particular, knowledge-sharing communities such as Wikipedia contain a huge amount of information encompassing disparate and diverse fields. Knowledge bases such as DBpedia or Yago represent the data in a concise and more structured way bearing the potential of bringing database tools to Web Search. The wealth of data, however, poses the challenge of how to retrieve important and valuable information, which is often intertwined with trivial and less important details. This calls for an efficient and automatic summarization method. In this demonstration proposal, we consider the novel problem of summarizing the information related to a given entity, like a person or an organization. To this end, we utilize the rich type graph that knowledge bases provide for each entity, and define the problem of selecting the best cost-restricted subset of types as summary with good coverage of salient properties. We propose a demonstration of our system which allows the user to specify the entity to summarize, an upper bound on the cost of the resulting summary, as well as to browse the knowledge base in a more simple and intuitive manner. 0 0
Emergent verbal behaviour in human-robot interaction Kristiina Jokinen
Graham Wilcock
2011 2nd International Conference on Cognitive Infocommunications, CogInfoCom 2011 English 2011 The paper describes emergent verbal behaviour that arises when speech components are added to a robotics simulator. In the existing simulator the robot performs its activities silently. When speech synthesis is added, the first level of emergent verbal behaviour is that the robot produces spoken monologues giving a stream of simple explanations of its movements. When speech recognition is added, human-robot interaction can be initiated by the human, using voice commands to direct the robot's movements. In addition, cooperative verbal behaviour emerges when the robot modifies its own verbal behaviour in response to being asked by the human to talk less or more. The robotics framework supports different behavioural paradigms, including finite state machines, reinforcement learning and fuzzy decisions. By combining finite state machines with the speech interface, spoken dialogue systems based on state transitions can be implemented. These dialogue systems exemplify emergent verbal behaviour that is robot-initiated: the robot asks appropriate questions in order to achieve the dialogue goal. The paper mentions current work on using Wikipedia as a knowledge base for open-domain dialogues, and suggests promising ideas for topic-tracking and robot-initiated conversational topics. 0 0
Exploring entity relations for named entity disambiguation Ploch D. ACL HLT 2011 - 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of Student Session English 2011 Named entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named entity disambiguation is challenging because entity mentions can be ambiguous and an entity can be referenced by different surface forms. We present an approach that exploits Wikipedia relations between entities co-occurring with the ambiguous form to derive a range of novel features for classifying candidate referents. We find that our features improve disambiguation results significantly over a strong popularity baseline, and are especially suitable for recognizing entities not contained in the knowledge base. Our system achieves state-of-the-art results on the TAC-KBP 2009 dataset. 0 0
Extracting information about security vulnerabilities from Web text Mulwad V.
Li W.
Joshi A.
Tim Finin
Viswanathan K.
Proceedings - 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2011 English 2011 The Web is an important source of information about computer security threats, vulnerabilities and cyberattacks. We present initial work on developing a framework to detect and extract information about vulnerabilities and attacks from Web text. Our prototype system uses Wikitology, a general purpose knowledge base derived from Wikipedia, to extract concepts that describe specific vulnerabilities and attacks, map them to related concepts from DBpedia and generate machine understandable assertions. Such a framework will be useful in adding structure to already existing vulnerability descriptions as well as detecting new ones. We evaluate our approach against vulnerability descriptions from the National Vulnerability Database. Our results suggest that it can be useful in monitoring streams of text from social media or chat rooms to identify potential new attacks and vulnerabilities or to collect data on the spread and volume of existing ones. 0 0
First steps beyond the bag-of-words representation of short texts Paolo Ferragina
Ugo Scaiella
CEUR Workshop Proceedings English 2011 We address the problem of enhancing the classical bag-of- words representation of texts by designing and engineering Tagme, the first system that performs an accurate and on-the-y semantic annota- tion of short texts via Wikipedia as knowledge base. Several experiments show that Tagme outperforms state-of-the-art algorithms when they are adapted to work on short texts and it results fast and competitive on long ones. This leads us to argue favorably about Tagme's application to clustering, classification and retrieval systems on challenging scenarios like web-snippets, tweets, news, ads, etc. 0 0
From names to entities using thematic context distance Pilz A.
Paass G.
International Conference on Information and Knowledge Management, Proceedings English 2011 Name ambiguity arises from the polysemy of names and causes uncertainty about the true identity of entities referenced in unstructured text. This is a major problem in areas like information retrieval or knowledge management, for example when searching for a specific entity or updating an existing knowledge base. We approach this problem of named entity disambiguation (NED) using thematic information derived from Latent Dirichlet Allocation (LDA) to compare the entity mention's context with candidate entities in Wikipedia represented by their respective articles. We evaluate various distances over topic distributions in a supervised classification setting to find the best suited candidate entity, which is either covered in Wikipedia or unknown. We compare our approach to a state of the art method and show that it achieves significantly better results in predictive performance, regarding both entities covered in Wikipedia as well as uncovered entities. We show that our approach is in general language independent as we obtain equally good results for named entity disambiguation using the English, the German and the French Wikipedia. 0 0
ITEM: Extract and integrate entities from tabular data to RDF knowledge base Guo X.
Yirong Chen
Jilin Chen
Du X.
Lecture Notes in Computer Science English 2011 Many RDF Knowledge Bases are created and enlarged by mining and extracting web data. Hence their data sources are limited to social tagging networks, such as Wikipedia, WordNet, IMDB, etc., and their precision is not guaranteed. In this paper, we propose a new system, ITEM, for extracting and integrating entities from tabular data to RDF knowledge base. ITEM can efficiently compute the schema mapping between a table and a KB, and inject novel entities into the KB. Therefore, ITEM can enlarge and improve RDF KB by employing tabular data, which is assumed of high quality. ITEM detects the schema mapping between table and RDF KB only by tuples, rather than the table's schema information. Experimental results show that our system has high precision and good performance. 0 0
Identifying aspects for web-search queries Fei Wu
Madhavan J.
Halevy A.
Journal of Artificial Intelligence Research English 2011 Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effectively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be "semantically" related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives - related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing. © 2011 AI Access Foundation. All rights reserved. 0 0
Knowledge Base Population: Successful approaches and challenges Ji H.
Grishman R.
ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies English 2011 In this paper we give an overview of the Knowledge Base Population (KBP) track at the 2010 Text Analysis Conference. The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. This is done through two tasks, Entity Linking - linking names in context to entities in the KB -and Slot Filling - adding information about an entity to the KB. A large source collection of newswire and web documents is provided from which systems are to discover information. Attributes ("slots") derived from Wikipedia infoboxes are used to create the reference KB. In this paper we provide an overview of the techniques which can serve as a basis for a good KBP system, lay out the remaining challenges by comparison with traditional Information Extraction (IE) and Question Answering (QA) tasks, and provide some suggestions to address these challenges. 0 0
Multipedia: Enriching DBpedia with multimedia information Garcia-Silva A.
Max Jakob
Mendes P.N.
Christian Bizer
KCAP 2011 - Proceedings of the 2011 Knowledge Capture Conference English 2011 Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%. 0 0
Shortipedia aggregating and curating Semantic Web data Vrandecic D.
Ratnakar V.
Krotzsch M.
Gil Y.
Journal of Web Semantics English 2011 Shortipedia is a Web-based knowledge repository, that pulls together a growing number of sources in order to provide a comprehensive, diversified view on entities of interest. Contributors to Shortipedia can easily add claims to the knowledge base, provide sources for their claims, and find links to knowledge already available on the Semantic Web. © 2011 Elsevier B.V. All rights reserved. 0 0
Temporal knowledge for timely intelligence Gerhard Weikum
Bedathur S.
Ralf Schenkel
Lecture Notes in Business Information Processing English 2011 Knowledge bases about entities and their relationships are a great asset for business intelligence. Major advances in information extraction and the proliferation of knowledge-sharing communities like Wikipedia have enabled ways for the largely automated construction of rich knowledge bases. Such knowledge about entity-oriented facts can greatly improve the output quality and possibly also efficiency of processing business-relevant documents and event logs. This holds for information within the enterprise as well as in Web communities such as blogs. However, no knowledge base will ever be fully complete and real-world knowledge is continuously changing: new facts supersede old facts, knowledge grows in various dimensions, and completely new classes, relation types, or knowledge structures will arise. This leads to a number of difficult research questions regarding temporal knowledge and the life-cycle of knowledge bases. This short paper outlines challenging issues and research opportunities, and provides references to technical literature. 0 0
An efficient web-based wrapper and annotator for tabular data Amin M.S.
Jamil H.
International Journal of Software Engineering and Knowledge Engineering English 2010 In the last few years, several works in the literature have addressed the problem of data extraction from web pages. The importance of this problem derives from the fact that, once extracted, data can be handled in a way similar to instances of a traditional database, which in turn can facilitate application of web data integration and various other domain specific problems. In this paper, we propose a novel table extraction technique that works on web pages generated dynamically from a back-end database. The proposed system can automatically discover table structure by relevant pattern mining from web pages in an efficient way, and can generate regular expression for the extraction process. Moreover, the proposed system can assign intuitive column names to the columns of the extracted table by leveraging Wikipedia knowledge base for the purpose of table annotation. To improve accuracy of the assignment, we exploit the structural homogeneity of the column values and their co-location information to weed out less likely candidates. This approach requires no human intervention and experimental results have shown its accuracy to be promising. Moreover, the wrapper generation algorithm works in linear time. 0 0
An evidence-based approach to collaborative ontology development Tonkin E.
Pfeiffer H.D.
Hewson A.
Proceedings of the International Symposium on Matching and Meaning Automated Development, Evolution and Interpretation of Ontologies - A Symposium at the AISB 2010 Convention English 2010 The development of ontologies for various purposes is now a relatively commonplace process. A number of different approaches towards this aim are evident; empirical methodologies, giving rise to data-driven procedures; self-reflective (innate) methodologies, resulting in artifacts that are based on intellectual understanding; collaborative approaches, which result in the development of an artifact representing a consensus viewpoint. We compare and contrast these approaches through two parallel use cases, in work that is currently ongoing. The first explores a case study in creation of a knowledge base from raw, semi-structured information available on the Web. This makes use of text and data mining approaches from various sources of information, including semi-formally structured metadata, interpreted using methods drawn from statistical analysis, and data drawn from crowd-sourced resources such as Wikipedia. The second explores ontology development in the area of physical computing, specifically, context-awareness in ubiquitous computing, and focuses on exploring the significant impact of an evidence-led approach. Both examples are chosen from domains in which automated extraction of information is a significant use case for the resulting ontology. In the first case, automated extraction takes the form of indexing for search and browse of the archived data. In the second, the predominant use cases relate to context-awareness. Via these examples, we identify a core set of design principles for software platforms that bring together evidence from each of these processes, exploring participatory development of ontologies intended for use in domains in which empirical evidence and user judgment are allied. 0 0
Approaches for automatically enriching wikipedia Zareen Syed
Tim Finin
AAAI Workshop - Technical Report English 2010 We have been exploring the use of Web-derived knowledge bases through the development of Wikitology - a hybrid knowledge base of structured and unstructured information extracted from Wikipedia augmented by RDF data from DBpedia and other Linked Open Data resources. In this paper, we describe approaches that aid in enriching Wikipedia and thus the resources that derive from Wikipedia such as the Wikitology knowledge base, DBpedia, Freebase and Powerset. Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Chapter 3: Search for knowledge Gerhard Weikum Lecture Notes in Computer Science English 2010 There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. In addition, Semantic-Web-style ontologies, structured Deep-Web sources, and Social-Web networks and tagging communities can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This vision and position paper discusses opportunities and challenges along this research avenue. The technical issues to be looked into include knowledge harvesting to construct large knowledge bases, searching for knowledge in terms of entities and relationships, and ranking the results of such queries. 0 0
Collective cross-document relation extraction without labelled data Yao L.
Riedel S.
McCallum A.
EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference English 2010 We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an in-domain (Wikipedia) and a more realistic out-of-domain (New York Times Corpus) setting. For the in-domain setting, our joint model leads to 4% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13% over the pipeline, and 15% over the isolated baseline. 0 0
Creating and exploiting a Web of semantic data Tim Finin
Zareen Syed
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence, Proceedings English 2010 Twenty years ago Tim Berners-Lee proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer programs to share and understand structured and semi-structured information easily. We will review the evolution of the idea and technologies to realize a Web of Data and describe how we are exploiting them to enhance information retrieval and information extraction. A key resource in our work is Wikitology, a hybrid knowledge base of structured and unstructured information extracted from Wikipedia. 0 0
Do wikipedians follow domain experts?: A domain-specific study on wikipedia knowledge building YanChun Zhang
Aixin Sun
Anwitaman Datta
Kuiyu Chang
Lim E.-P.
Proceedings of the ACM International Conference on Digital Libraries English 2010 Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT. We identified 409 Wikipedia articles matching TKB records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB - despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. 0 1
Dynamic topic detection and tracking based on knowledge base Se Wang
Du J.
Liang M.
Long Chen
Proceedings - 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology, IC-BNMT2010 English 2010 In order to solve the sparse initial information problem when the topic model was established ever before, this paper establishes the Wikipedia based news event knowledge base. Referring to this knowledge base, we calculate the weight of the news model, make the similarity measurement based on the time distance, make the clustering based on time line, and apply the dynamic threshold strategy to detect and track the topics automatically in the news materials. The experiment result verifies the validity of this method. 0 0
Entity linking leveraging automatically generated annotation Weinan Zhang
Su J.
Tan C.L.
Wang W.T.
Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference English 2010 Entity linking refers entity mentions in a document to their representations in a knowledge base (KB). In this paper, we propose to use additional information sources from Wikipedia to find more name variations for entity linking task. In addition, as manually creating a training corpus for entity linking is laborintensive and costly, we present a novel method to automatically generate a large scale corpus annotation for ambiguous mentions leveraging on their unambiguous synonyms in the document collection. Then, a binary classifier is trained to filter out KB entities that are not similar to current mentions. This classifier not only can effectively reduce the ambiguities to the existing entities in KB, but also be very useful to highlight the new entities to KB for the further population. Furthermore, we also leverage on the Wikipedia documents to provide additional information which is not available in our generated corpus through a domain adaption approach which provides further performance improvements. The experiment results show that our proposed method outperforms the state-of-the-art approaches. 0 0
Entity-relationship queries over Wikipedia Li X.
Chenliang Li
Yu C.
International Conference on Information and Knowledge Management, Proceedings English 2010 Wikipedia is the largest user-generated knowledge base. We propose a structured query mechanism, entity-relationship query, for searching entities in Wikipedia corpus by their properties and inter-relationships. An entity-relationship query consists of arbitrary number of predicates on desired entities. The semantics of each predicate is specified with keywords. Entity-relationship query searches entities directly over text rather than pre-extracted structured data stores. This characteristic brings two benefits: (1) Query semantics can be intuitively expressed by keywords; (2) It avoids information loss that happens during extraction. We present a ranking framework for general entity-relationship queries and a position-based Bounded Cumulative Model for accurate ranking of query answers. Experiments on INEX benchmark queries and our own crafted queries show the effectiveness and accuracy of our ranking method. 0 0
Exploiting wikipedia in query understanding systems Richard Khoury 2010 5th International Conference on Digital Information Management, ICDIM 2010 English 2010 In recent years, the free online encyclopaedia Wikipedia has become a standard resource to exploit to build knowledge base for various Natural Language Processing applications. In this paper, we exploit that resource to design a new query classification system. We explain and justify the steps we take to extract information from Wikipedia into a structured database, in order to demonstrate the validity of our design. We then show, both with mathematical reasoning and with experimental results, how to exploit the information in the database for the purpose of query classification. 0 0
Finding new information via robust entity detection Iacobelli F.
Nichols N.
Birnbaum L.
Hammond K.
AAAI Fall Symposium - Technical Report English 2010 Journalists and editors work under pressure to collect relevant details and background information about specific events. They spend a significant amount of time sifting through documents and finding new information such as facts, opinions or stakeholders (i.e. people, places and organizations that have a stake in the news). Spotting them is a tedious and cognitively intense process. One task, essential to this process, is to find and keep track of stakeholders. This task is taxing cognitively and in terms of memory. Tell Me More offers an automatic aid to this task. Tell Me More is a system that, given a seed story, mines the web for similar stories reported by different sources and selects only those stories which offer new information with respect to that original seed story. Much like a journalist, the task of detecting named entities is central to its success. In this paper we briefly describe Tell Me More and, in particular, we focus on Tell Me More's entity detection component. We describe an approach that combines off-the-shelf named entity recognizers (NERs) with WPED, an in-house publicly available NER that uses Wikipedia as its knowledge base. We show significant increase in precision scores with respect to traditional NERs. Lastly, we present an overall evaluation of Tell Me More using this approach. Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
From information to knowledge: Harvesting entities and relationships from web sources Gerhard Weikum
Martin Theobald
Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems English 2010 There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting. 0 0
Gathering and ranking photos of named entities with high precision, high recall, and diversity Taneva B.
Kacimi M.
Gerhard Weikum
WSDM 2010 - Proceedings of the 3rd ACM International Conference on Web Search and Data Mining English 2010 Knowledge-sharing communities like Wikipedia and automated extraction methods like those of DBpedia enable the construction of large machine-processible knowledge bases with relational facts about entities. These endeavors lack multimodal data like photos and videos of people and places. While photos of famous entities are abundant on the Internet, they are much harder to retrieve for less popular entities such as notable computer scientists or regionally interesting churches. Querying the entity names in image search engines yields large candidate lists, but they often have low precision and unsatisfactory recall. Our goal is to populate a knowledge base with photos of named entities, with high precision, high recall, and diversity of photos for a given entity. We harness relational facts about entities for generating expanded queries to retrieve different candidate lists from image search engines. We use a weighted voting method to determine better rankings of an entity's photos. Appropriate weights are dependent on the type of entity (e.g., scientist vs. politician) and automatically computed from a small set of training entities. We also exploit visual similarity measures based on SIFT features, for higher diversity in the final rankings. Our experiments with photos of persons and landmarks show significant improvements of ranking measures like MAP and NDCG, and also for diversity-aware ranking. Copyright 2010 ACM. 0 0
Implementing a wiki to capture and share engineering knowledge Catic A.
Malmqvist J.
Proceedings of NordDesign 2010, the 8th International NordDesign Conference English 2010 This paper describes the implementation of a wiki system based on the wiki engine MediaWiki for the purpose of engineering knowledge capture and sharing in an internal R&D unit that is part of a global group of companies in the commercial vehicle industry. Three different knowledge processes are studied; 1. Knowledge creation that is based on a socialization process that mainly creates tacit knowledge distributed across individuals; 2. Knowledge transfer that is based on reuse of tacit knowledge by physical transferral of knowledge holders; and 3. Knowledge application which entails a core team of individuals applying their collective knowledge base to solve a given problem. It is found that a wiki system"s features of collaborative and web based input make it possible to support all three of the processes by making the tacit knowledge base explicit. It is concluded however that the implementation of a wiki also needs: 1. A structure that reflects the business processes in the unit; 2. A clear definition of knowledge as a deliverable in the processes; 3. A model for how the time spent on contributing to the wiki is financed; and 4. A strategy for tackling corporate IT governance policies" inability to manage interactive Web 2.0 technologies. 0 0
Learning from the web: Extracting general world knowledge from noisy text Gordon J.
Van Durme B.
Schubert L.K.
AAAI Workshop - Technical Report English 2010 The quality and nature of knowledge that can be found by an automated knowledge-extraction system depends on its inputs. For systems that learn by reading text, the Web offers a breadth of topics and currency, but it also presents the problems of dealing with casual, unedited writing, non-textual inputs, and the mingling of languages. The results of extraction using the KNEXT system on two Web corpora - Wikipedia and a collection of weblog entries - indicate that, with automatic filtering of the output, even ungrammatical writing on arbitrary topics can yield an extensive knowledge base, which human judges find to be of good quality, with propositions receiving an average score across both corpora of 2.34 (where the range is 1 to 5 and lower is better) versus 3.00 for unfiltered output from the same sources. Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Learning to find interesting connections in Wikipedia Marek Ciglan
Rivierez E.
Norvag K.
Advances in Web Technologies and Applications - Proceedings of the 12th Asia-Pacific Web Conference, APWeb 2010 English 2010 To help users answer the question, what is the relation between (real world) entities or concepts, we might need to go well beyond the borders of traditional information retrieval systems. In this paper, we explore the possibility of exploiting the Wikipedia link graph as a knowledge base for finding interesting connections between two or more given concepts, described by Wikipedia articles. We use a modified Spreading Activation algorithm to identify connections between input concepts. The main challenge in our approach lies in assessing the strength of a relation defined by a link between articles. We propose two approaches for link weighting and evaluate their results with a user evaluation. Our results show a strong correlation between used weighting methods and user preferences; results indicate that the Wikipedia link graph can be used as valuable semantic resource. 0 0
MENTA: inducing multilingual taxonomies from wikipedia Gerard de Melo
Gerhard Weikum
CIKM English 2010 0 0
Mining wikipedia knowledge to improve document indexing and classification Ayyasamy R.K.
Tahayna B.
Alhashmi S.
Eu-Gene S.
Egerton S.
10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010 English 2010 Web logs are an important source of information that requires automatic techniques to categorize them into "topic-based" content, to facilitate their future browsing and retrieval. In this paper we propose and illustrate the effectiveness of a new tf.idf measure. The proposed Conf.idf, Catf.idf measures are solely based on the mapping of terms-to-concepts-to- categories (TCONCAT) method that utilizes Wikipedia. The Knowledge base-Wikipedia is considered as a large scale Web encyclopaedia, that has high-quality and huge number of articles and categorical indexes. Using this system, our proposed framework consists of two stages to solve weblog classification problem. The first stage is to find out the terms belonging to a unique concept (article), as well as to disambiguate the terms belonging to more than one concept. The second stage is the determination of the categories to which these found concepts belong to. Experimental result confirms that, proposed system can distinguish the web logs that belongs to more than one category efficiently and has a better performance and success than the traditional statistical Natural Language Processing-NLP approaches. 0 0
Requirements for semantic web applications in engineering David Fowler
Crowder R.M.
Tao Guan
Shadbolt N.
Gary Wills
Proceedings of the ASME Design Engineering Technical Conference English 2010 In this paper we describe some applications of Semantic Web technologies for the engineering design community. Specifically, we use Semantic Wikis to form a central knowledge base, which other applications then refer to. The developed applications include an advisor for performing Computational Fluid Dynamics simulations, a Semantic search engine, and an assistant for airfoil design. In the conclusions we discuss lessons learned and subsequently requirements for future systems. Copyright 0 0
Semantic sense extraction from Wikipedia pages Pirrone R.
Pipitone A.
Russo G.
3rd International Conference on Human System Interaction, HSI'2010 - Conference Proceedings English 2010 This paper discusses a modality to access and to organize unstructured contents related to a particular topic coming from the access to Wikipedia pages. The proposed approach is focused on the acquisition of new knowledge from Wikipedia pages and is based on the definition of useful patterns able to extract and identify novel concepts and relations to be added in the knowledge base. We proposes a method that uses information from the wiki page's structure. According to the different part of the page we define different strategies to obtain new concepts or relation between them. We analyze not only structure but text directly to obtain relations and concepts and to extract the type of relations to be incorporated in a domain ontology. The purpose is to use the obtained information in an intelligent tutoring system to improve his capabilities in dialogue management with users. 0 0
Timely YAGO: Harvesting, querying, and visualizing temporal knowledge from Wikipedia Yafang Wang
Mingjie Zhu
Qu L.
Marc Spaniol
Gerhard Weikum
Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings English 2010 Recent progress in information extraction has shown how to automatically build large ontologies from high-quality sources like Wikipedia. But knowledge evolves over time; facts have associated validity intervals. Therefore, ontologies should include time as a first-class dimension. In this paper, we introduce Timely YAGO, which extends our previously built knowledge base YAGO with temporal aspects. This prototype system extracts temporal facts from Wikipedia infoboxes, categories, and lists in articles, and integrates these into the Timely YAGO knowledge base. We also support querying temporal facts, by temporal predicates in a SPARQL-style language. Visualization of query results is provided in order to better understand of the dynamic nature of knowledge. Copyright 2010 ACM. 0 0
Towards meta-engineering for semantic wikis Jochen Reutelshoefer
Joachim Baumeister
Frank Puppe
CEUR Workshop Proceedings English 2010 Building intelligent systems is a complex task. In many knowledge engineering projects the knowledge acquisition activities can significantly benefit from a tool, that is tailored to the specific project setting with respect to domain, contributors, and goals. Specifying and building a new tool from scratch is ambitious, tedious, and delaying. In this paper we introduce a wiki-based meta-engineering approach allowing for the smooth beginning of the knowledge acquisition activities going along with tool specification and tailored implementation. Meta-engineering proposes that in a wiki-based knowledge engineering project not only the content (the knowledge base) should be developed in evolutionary manner but also the tool itself. 0 0
Using encyclopaedic knowledge for query classification Richard Khoury Proceedings of the 2010 International Conference on Artificial Intelligence, ICAI 2010 English 2010 Identifying the intended topic that underlies a user's queiy can benefit a large range of applications, from search engines to question-answering systems. However, query classification remains a difficult challenge due to the variety of queries a user can ask, the wide range of topics users can ask about, and the limited amount of information that can be mined from the queiy. In this paper, we develop a new query classification system that accounts for these three challenges. Our system relies on encyclopaedic knowledge to understand the user's queiy and fill in the gaps of missing information. Specifically, we use the freely-available online encyclopaedia Wikipedia as a natural-language knowledge base, and exploit Wikipedia's structure to infer the correct classification of any user queiy. 0 0
Using wikipedia categories for compact representations of chemical documents Kohncke B.
Balke W.-T.
International Conference on Information and Knowledge Management, Proceedings English 2010 Today, Web pages are usually accessed using text search engines, whereas documents stored in the deep Web are accessed through domain-specific Web portals. These portals rely on external knowledge bases, respectively ontologies, mapping documents to more general concepts allowing for suitable classifications and navigational browsing. Since automatically generated ontologies are still not satisfactory for advanced information retrieval tasks, most portals heavily rely on hand-crafted domain-specific ontologies. This, however, also leads to high creation and maintaining costs. On the other hand, a freely available community maintained, if somewhat general, knowledge base is offered by Wikipedia. During the last years the coverage of Wikipedia has reached a large pool of information including articles from almost all domains. In this paper, we investigate the use of Wikipedia categories to describe the content of chemical documents in a compact form. We compare the results to the domain-specific ChEBI ontology and the results show that Wikipedia categories indeed allow useful descriptions for chemical documents that are even better than descriptions from the ChEBI ontology. 0 0
WikiPop - Personalized Event Detection System Based on Wikipedia Page View Statistics Marek Ciglan
Kjetil Nørvåg
English 2010 In this paper, we describe WikiPop, a system designed to detect significant increase of popularity of topics related to users' interests. We exploit Wikipedia page view statistics to identify concepts with significant increase of the interest from the public. Daily, there are thousands of articles with increased popularity; thus, a personalization is in order to provide the user only with results re- lated to his/her interest. TheWikiPop system allows a user to define a context by stating a set of Wikipedia articles describing topics of interest. The system is then able to search, for the given date, for popular topics related to the user defined context. 0 1
WikiPop - Personalized event detection system based on Wikipedia page view statistics Marek Ciglan
Norvag K.
International Conference on Information and Knowledge Management, Proceedings English 2010 In this paper, we describe WikiPop service, a system designed to detect significant increase of popularity of topics related to users' interests. We exploit Wikipedia page view statistics to identify concepts with significant increase of the interest from the public. Daily, there are thousands of articles with increased popularity; thus, a personalization is in order to provide the user only with results related to his/her interest. The WikiPop system allows a user to define a context by stating a set of Wikipedia articles describing topics of interest. The system is then able to search, for the given date, for popular topics related to the user defined context. 0 1
A knowledge workbench for software development Panagiotou D.
Mentzas G.
Proceedings of I-KNOW 2009 - 9th International Conference on Knowledge Management and Knowledge Technologies and Proceedings of I-SEMANTICS 2009 - 5th International Conference on Semantic Systems English 2009 Modern software development is highly knowledge intensive; it requires that software developers create and share new knowledge during their daily work. However, current software development environments are "syntactic", i.e. they do not facilitate understanding the semantics of software artefacts and hence cannot fully support the knowledge-driven activities of developers. In this paper we present KnowBench, a knowledge workbench environment which focuses on the software development domain and strives to address these problems. KnowBench aims at providing software developers such a tool to ease their daily work and facilitate the articulation and visualization of software artefacts, concept-based source code documentation and related problem solving. Building a knowledge base with software artefacts by using the KnowBench system can then be exploited by semantic search engines or P2P metadata infrastructures in order to foster the dissemination of software development knowledge and facilitate cooperation among software developers. 0 0
An interactive semantic knowledge base unifying wikipedia and HowNet Hongzhi Guo
Qingcai Chen
Lei Cui
Xiaolong Wang
ICICS 2009 - Conference Proceedings of the 7th International Conference on Information, Communications and Signal Processing English 2009 We present an interactive, exoteric semantic knowledge base, which integrates HowNet and the online encyclopedia Wikipedia. The semantic knowledge base mainly builds on items, categories, attributes and relation between. In the constructing process, a mapping relationship is established from HowNet, Wikipedia to the new knowledge base. Different from other online encyclopedias or knowledge dictionaries, the categories in the semantic knowledge base are semantically tagged, and this can be well used in semantic analysis and semantic computing. Currently the knowledge base built in this paper contains more than 200,000 items and 1,000 categories, and these are still increasing every day. 0 0
Building knowledge base for Vietnamese information retrieval Nguyen T.C.
Le H.M.
Phan T.T.
IiWAS2009 - The 11th International Conference on Information Integration and Web-based Applications and Services English 2009 At present, Vietnamese knowledge base (vnKB) is one of the most important focuses of Vietnamese researchers because of its applications in wide areas such as Information Retrieval (IR), Machine Translation (MT) etc. There have been several separate projects developing vnKB in various domains. The training in vnBK is the most difficulty because of quantity and quality of training data, and lacking of available Vietnamese corpus with acceptable quality. This paper introduces an approach, which first extracts semantic information from Vietnamese Wikipedia (vnWK), then trains the proposed vnKB by applying support vector machine (SVM) technique. The experimentation of the proposed approach shows that it is a potential solution because of its good results and proves that it can provide more valuable benefits when applying to our Vietnamese Semantic Information Retrieval system. 0 0
Classifying web pages by using knowledge bases for entity retrieval Kiritani Y.
Ma Q.
Masatoshi Yoshikawa
Lecture Notes in Computer Science English 2009 In this paper, we propose a novel method to classify Web pages by using knowledge bases for entity search, which is a kind of typical Web search for information related to a person, location or organization. First, we map a Web page to entities according to the similarities between the page and the entities. Various methods for computing such similarity are applied. For example, we can compute the similarity between a given page and a Wikipedia article describing a certain entity. The frequency of an entity appearing in the page is another factor used in computing the similarity. Second, we construct a directed acyclic graph, named PEC graph, based on the relations among Web pages, entities, and categories, by referring to YAGO, a knowledge base built on Wikipedia and WordNet. Finally, by analyzing the PEC graph, we classify Web pages into categories. The results of some preliminary experiments validate the methods proposed in this paper. 0 0
Communication process and collaborative work in web 2.0 environment Kim E.
Jinhyun Ahn
Lee D.
ACM International Conference Proceeding Series English 2009 Because the higher level of media richness improves the performance of collaborative works such as knowledge sharing, efforts to raise media richness are encouraged. The Channel Expansion Theory argues that individuals' perceptions of media richness vary according to each individual's knowledge base built from prior experiences related to the communication situation. This study explored the channel expansion effects in the new CMC environment, Web 2.0. In particular, we considered communication process modes (i.e., conveyance and convergence) as a factor moderating the effects. The research model was verified by an experiment with student subjects. Copyright 0 0
DBpedia live extraction Sebastian Hellmann
Claus Stadler
Janette Lehmann
Sören Auer
Lecture Notes in Computer Science English 2009 The DBpedia project extracts information from Wikipedia, interlinks it with other knowledge bases, and makes this data available as RDF. So far the DBpedia project has succeeded in creating one of the largest knowledge bases on the Data Web, which is used in many applications and research prototypes. However, the heavy-weight extraction process has been a drawback. It requires manual effort to produce a new release and the extracted information is not up-to-date. We extended DBpedia with a live extraction framework, which is capable of processing tens of thousands of changes per day in order to consume the constant stream of Wikipedia updates. This allows direct modifications of the knowledge base and closer interaction of users with DBpedia. We also show how the Wikipedia community itself is now able to take part in the DBpedia ontology engineering process and that an interactive roundtrip engineering between Wikipedia and DBpedia is made possible. 0 0
Effective extraction of thematically grouped key terms from text Maria Grineva
Maxim Grinev
Dmitry Lizorkin
AAAI Spring Symposium - Technical Report English 2009 We present a novel method for extraction of key terms from text documents. The important and novel feature of our method is that it produces groups of key terms, while each group contains key terms semantically related to one of the main themes of the document. Our method bases on a com-bination of the following two techniques: Wikipedia-based semantic relatedness measure of terms and algorithm for detecting community structure of a network. One of the advantages of our method is that it does not require any training, as it works upon the Wikipedia knowledge base. Our experimental evaluation using human judgments shows that our method produces key terms with high precision and recall. 0 0
Employee knowledge: Instantly searchable Hoffman J. Digital Energy Conference and Exhibition 2009 English 2009 The online encyclopedia, Wikipedia, has proven the value of the world community contributing to an instantly searchable world knowledge base. The same technology can be applied to the company community: each individual sharing strategic tips directly related to company interests that are then instantly searchable. Each employee can share, using Microsoft Sharepoint Wiki Pages, those unique hints, tips, tricks, and knowledge that they feel could be of the highest value to other employees: how-to's and shortcuts in company software packages, learnings from pilot projects (successful or not), links to fantastic resources, etc. This growing knowledge base then becomes an instantly searchable, global resource for the entire company. Occidental of Elk Hills, Inc. just recently, October 15, 2008, started a rollout of Wiki page use at its Elk Hills, CA, USA properties. There are over 300 employees at Elk Hills and its Wiki Home Page received over 1500 hits in its first day, with multiple employees contributing multiple articles. Employees are already talking about time-savers they have learned and applied. A second presentation was demanded by those that missed the first. The rollout has generated a buzz of excitement and interest that we will be encouraging into the indefinite future. The significance of a corporate knowledge base can be major: high-tech professionals not spending hours figuring out how to do what someone else has already figured out and documented, support personnel not having to answer the same questions over and over again but having only to point those asking to steps already documented, employees learning time-saving tips that they may never have learned or thought of, professionals no longer wasting time searching for results of other trials or having to reinvent the wheel. Time is money. Knowledge is power. Applying Wiki technology to corporate knowledge returns time and knowledge to the workforce leading to bottom line benefits and powerful corporate growth. 0 0
Harvesting, searching, and ranking knowledge on the web Gerhard Weikum Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09 English 2009 There are major trends to advance the functionality of search engines to a more expressive semantic level (e.g., [2, 4, 6, 7, 8, 9, 13, 14, 18]). This is enabled by employing large-scale information extraction [1, 11, 20] of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style ontologies [22] and reaching into Deep-Web sources [16] can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This talk presents ongoing research towards this objective, with emphasis on our work on the YAGO knowledge base [23, 24] and the NAGA search engine [14] but also covering related projects. YAGO is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet with high accuracy and reconciled into a consistent RDF-style "semantic" graph. For further growing YAGO from Web sources while retaining its high quality, pattern-based extraction is combined with logic-based consistency checking in a unified framework [25]. NAGA provides graph-template-based search over this data, with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques [15, 17]. YAGO is publicly available and has been imported into various other knowledge-management projects including DB-pedia. YAGO shares many of its goals and methodologies with parallel projects along related lines. These include Avatar [19], Cimple/DBlife [10, 21], DBpedia [3], Know-ItAll/TextRunner [12, 5], Kylin/KOG [26, 27], and the Libra technology [18, 28] (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities. copyright 2009 ACM. 0 0
KB video retrieval at TRECVID 2009 Etter D. 2009 TREC Video Retrieval Evaluation Notebook Papers English 2009 This paper describes the KB Video Retrieval system for the TRECVID 2009 evaluation. Our research focus this year was on query expansion using external knowledge bases. 0 0
Large-scale cross-media retrieval of wikipediaMM images with textual and visual query expansion Zhou Z.
Tian Y.
Yanyan Li
Huang T.
Gao W.
Lecture Notes in Computer Science English 2009 In this paper, we present our approaches for the WikipediaMM task at ImageCLEF 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that was semi-automatically constructed from Wikipedia. Encouragingly, the experimental results rank in the first place among all submitted runs. We also implemented a content-based image retrieval approach with query-dependent visual concept detection. Then cross-media retrieval was successfully carried out by independently applying the two meta-search tools and then combining the results through a weighted summation of scores. Though not submitted, this approach outperforms our text-based and content-based approaches remarkably. 0 0
Mining concepts from Wikipedia for ontology construction Gaoying Cui
Lu Q.
Li W.
Yirong Chen
Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009 English 2009 An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5% makes it an effective approach to mine concepts from Wikipedia for ontology construction. 0 0
SASL: A semantic annotation system for literature Yuan P.
Gang Wang
Zhang Q.
Jin H.
Lecture Notes in Computer Science English 2009 Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL a good performance. 0 0
Semantic based adaptive movie summarisation Ren R.
Misra H.
Jose J.M.
Lecture Notes in Computer Science English 2009 This paper proposes a framework for automatic video summarization by exploiting internal and external textual descriptions. The web knowledge base Wikipedia is used as a middle media layer, which bridges the gap between general user descriptions and exact film subtitles. Latent Dirichlet Allocation (LDA) detects as well as matches the distribution of content topics in Wikipedia items and movie subtitles. A saliency based summarization system then selects perceptually attractive segments from each content topic for summary composition. The evaluation collection consists of six English movies and a high topic coverage is shown over official trails from the Internet Movie Database. 0 0
Supporting personal semantic annotations in P2P semantic wikis Torres D.
Hala Skaf-Molli
Diaz A.
Pascal Molli
Lecture Notes in Computer Science English 2009 In this paper, we propose to extend Peer-to-Peer Semantic Wikis with personal semantic annotations. Semantic Wikis are one of the most successful Semantic Web applications. In semantic wikis, wikis pages are annotated with semantic data to facilitate the navigation, information retrieving and ontology emerging. Semantic data represents the shared knowledge base which describes the common understanding of the community. However, in a collaborative knowledge building process the knowledge is basically created by individuals who are involved in a social process. Therefore, it is fundamental to support personal knowledge building in a differentiated way. Currently there are no available semantic wikis that support both personal and shared understandings. In order to overcome this problem, we propose a P2P collaborative knowledge building process and extend semantic wikis with personal annotations facilities to express personal understanding. In this paper, we detail the personal semantic annotation model and show its implementation in P2P semantic wikis. We also detail an evaluation study which shows that personal annotations demand less cognitive efforts than semantic data and are very useful to enrich the shared knowledge base. 0 0
Using wikitology for cross-document entity coreference resolution Tim Finin
Zareen Syed
Mayfield J.
Mcnamee P.
Piatko C.
AAAI Spring Symposium - Technical Report English 2009 We describe the use of the Wikitology knowledge base as a resource for a variety of applications with special focus on a cross-document entity coreference resolution task. This task involves recognizing when entities and relations mentioned in different documents refer to the same object or relation in the world. Wikitology is a knowledge base system constructed with material from Wikipedia, DBpedia and Freebase that includes both unstructured text and semi-structured information. Wikitology was used to define features that were part of a system implemented by the Johns Hopkins University Human Language Technology Center of Excellence for the 2008 Automatic Content Extraction cross-document coreference resolution evaluation organized by National Institute of Standards and Technology. 0 0
Wiki tools in the preparation and support of e-learning courses Jancarik A.
Jancarikova K.
8th European Conference on eLearning 2009, ECEL 2009 English 2009 Wiki tools, which became known mainly thanks to the Wikipedia encyclopedia, represent quite a new phenomenon on the Internet. The work presented here deals with three areas connected to a possible use of wiki tools for the preparation of an e-learning course. To what extent does Wikipedia.com contain terms necessary for scientific lectures at the university level and to what extent are they localised into other languages? The second area covers the use of Wikipedias specialised in one theme as a knowledge base for e-learning study materials. Our experience with Enviwiki which originated within the E-V Learn project and its use in e-learning courses is presented. The third area aims at the use of wiki tools for building a knowledge base and sharing experience of the participants of an e-learning course. 0 0
How smart is a smart card? Naone E. Technology Review English 2008 Simson L. Garfinkel explores Wikipedia's epistemology and discovers that, far from being the free-for-all, the world's most popular reference is decidedly rigid. In its effort to ensure accuracy, Wikipedia relies entirely on verifiability, requiring that all factual claims include a citation to another published source. There are, of course, times when the consensus view and the truth align perfectly. The problem is how to determine when this is the case. This diversity of thought and action is what Wikipedia has tried to establish in building its vast and ever-expanding knowledge base. By letting anyone contribute, regardless of his or her credentials, it runs the risk that absurdities, inconsistencies, and misinformation will flourish. H. B. Phillips, the former head of MIT mathematics department, reveals that when there is considerable difference of opinion, there is no evidence that the intellectuals supply any better answers than ordinary people. 0 0
Introduction to the special issue on managing information extraction Doan A.
Gravano L.
Ramakrishnan R.
Vaithyanathan S.
SIGMOD Record English 2008 The special issue of SIGMOD Record, December 2008, focuses on managing information extraction (IE) with the help of nine papers, which have been segregated in management systems, novel IE technologies, building knowledge base with IE and web-scale, open IE. IE is typically a program, which is used to extract structured data from unstructured text and 'SystemT: A System for Declarative Information Extraction' describes three IE management systems currently under the development at IBM Almaden, Wisconsin and Yahoo! Research. The research paper' Webpage Understanding: Beyond Page Level Search', describes a powerful set of learning-based techniques, which can be used to extract structured data from Web pages. The paper, 'Using Wikipedia to Bootstrap Open Information Extraction' reveals all current open-IE systems adopt a structural targeting approach. The issue also describes a paper on Kylin, an open-IE system under development at the University of Washington, which adopts the traditional approach of relational targeting. 0 0
NAGA: Harvesting, searching and ranking knowledge Gjergji Kasneci
Suchanek F.M.
Ifrim G.
Elbassuoni S.
Maya Ramanath
Gerhard Weikum
Proceedings of the ACM SIGMOD International Conference on Management of Data English 2008 The presence of encyclopedic Web sources, such as Wikipedia, the Internet Movie Database (IMDB), World Factbook, etc. calls for new querying techniques that are simple and yet more expressive than those provided by standard keyword-based search engines. Searching for explicit knowledge needs to consider inherent semantic structures involving entities and relationships. In this demonstration proposal, we describe a semantic search system named NAGA. NAGA operates on a knowledge graph, which contains millions of entities and relationships derived from various encyclopedic Web sources, such as the ones above. NAGA's graph-based query language is geared towards expressing queries with additional semantic information. Its scoring model is based on the principles of generative language models, and formalizes several desiderata such as confidence, informativeness and compactness of answers. We propose a demonstration of NAGA which will allow users to browse the knowledge base through a user interface, enter queries in NAGA's query language and tune the ranking parameters to test various ranking aspects. 0 0
Named entity disambiguation on an ontology enriched by Wikipedia Nguyen H.T.
Cao T.H.
RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies English 2008 Currently, for named entity disambiguation, the shortage of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems. 0 0
Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation Denis Turdakov
Pavel Velikhov
CEUR Workshop Proceedings English 2008 Wikipedia has grown into a high quality up-todate knowledge base and can enable many knowledge-based applications, which rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently produce a list of ranked similar concepts for a given concept is very important for a wide range of applications. We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice's measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation. Our approach to word sense disambiguation is based solely on the similarity measure and produces results with high accuracy. 0 1
Discovering unknown connections - The DBpedia relationship finder Janette Lehmann
Schuppel J.
Sören Auer
The Social Semantic Web 2007 - Proceedings of the 1st Conference on Social Semantic Web, CSSW 2007 English 2007 The Relationship Finder is a tool for exploring connections between objects in a Semantic Web knowledge base. It offers a new way to get insights about elements in an ontology, in particular for large amounts of instance data. For this reason, we applied the idea to the DBpedia data set, which contains an enormous amount of knowledge extracted from Wikipedia. We describe the workings of the Relationship Finder algorithm and present some interesting statistical discoveries about DBpedia and Wikipedia. 0 0
Exploiting web 2.0 forallknowledge-based information retrieval Milne D.N. International Conference on Information and Knowledge Management, Proceedings English 2007 This paper describes ongoing research into obtaining and using knowledge bases to assist information retrieval. These structures are prohibitively expensive to obtain manually, yet automatic approaches have been researched for decades with limited success. This research investigates a potential shortcut: a way to provide knowledge bases automatically, without expecting computers to replace expert human indexers. Instead we aim to replace the professionals with thousands or even millions of amateurs: with the growing community of contributors who form the core of Web 2.0. Specifically we focus on Wikipedia, which represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide manually-defined yet inexpensive knowledge-bases that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We are also concerned with how best to make these structures available to users, and aim to produce a complete knowledge-based retrieval system-both the knowledge base and the tools to apply it-that can be evaluated by how well it assists real users in performing realistic and practical information retrieval tasks. To this end we have developed Koru, a new search engine that offers concrete evidence of the effectiveness of our Web 2.0 based techniques for assisting information retrieval. 0 0
MSG-052 knowledge network for federation architecture and design Ohlund G.
Lofstrand B.
Hassaine F.
Fall Simulation Interoperability Workshop 2007 English 2007 Development of distributed simulations is a complex process requiring extensive experience, in-depth knowledge and a certain skills set for the Architecture, Design, development and systems integration required for a federation to meet its operational, functional and technical requirements. Federation architecture and design is the blueprint that forms the basis for federation-wide agreements on how to conceive and build a federation. Architecture and design issues are continuously being addressed during federation development. Knowledge of "good design" is gained through hands-on experience, trial-and-error and experimentation. This kind of knowledge however, is seldom reused and rarely shared in an effective way. This paper presents an ongoing effort conducted by MSG-052 "Knowledge Network for Federation Architecture and Design" within the NATO Research and Technology Organisation (NATO/RTO) Modelling and Simulation group (NMSG). The main objective of MSG-052 is to initiate a "Knowledge Network" to promote development and sharing of information and knowledge about common federation architecture and design issues among NATO/PfP (Partnership for Peace) countries. By Knowledge Network, we envision a combination of a Community of Practice (CoP), various organisations and Knowledge Bases. A CoP, consisting of federation development experts from the NATO/PfP nations, will foster the development of state-of-the-art federation architecture and design solutions, and provide a Knowledge Base for the Modelling and Simulation (M&S) community as a whole. As part of the work, existing structures and tools for knowledge capture, management and utilization will be explored, refined and used when appropriate; for instance the work previously done under MSG-027 PATHFINDER Integration Environment provides lessons learned that could benefit this group. The paper will explore the concept of a Community of Practice and reveal the ideas and findings within the MSG-052 Management Group concerning ways of establishing and managing a Federation Architecture and Design CoP. It will also offer several views on the concept of operations for a collaborative effort, combining voluntary contributions as well as assigned tasks. Amongst the preliminary findings was the notion of a Wiki-based Collaborative Environment in which a large portion of our work is conducted and which also represents our current Knowledge Base. Finally, we present some of our main challenges and vision for future work. 0 0
Reduce response time: Get "hooked" on a wiki Rebecca Klein
Marc Smith
David Sierkowski
Proceedings ACM SIGUCCS User Services Conference English 2007 Managing the flow of information both within the IT department and to our customers is one of our greatest challenges in the Office of Technology Information at Valparaiso University. To be successful, IT staff first need to acquire the right information from colleagues to provide excellent service. Then, the staff must determine the most effective way to communicate that information to internal and external customers to encourage the flow of information. To advance the IT department's goals, how best can we utilize "information" and "communication" vehicles to exchange information, improve workflow, and ultimately communicate essential information to our internal and external customers? We've asked ourselves this question and have resolved that "information" and "communication" need to work cooperatively! How better than with a wiki. Recent changes in departmental structure gave us the opportunity to examine our communication vehicles - specifically the software tools we use to facilitate the flow of information. Our previous knowledge base, First Level Support, a module of the HEAT support software produced by FrontRange Solutions, once met our needs as an internal knowledge base solution. We realized we had outgrown FLS and needed a more robust alternative. Our student employees asked for a newer, more interactive method of sharing information. With the assistance of our UNIX systems administrator, we investigated various options and decided to implement the MediaWiki system. As we had anticipated, use of this wiki system reduced the response time a customer must wait for an answer to their inquiry. What we didn't realize was that utilization of the wiki would meet many more needs than we had anticipated. It has also helped us meet other departmental needs, such as increased collaboration, an online knowledge base, and a training tool for staff. Come see how a sprinkle of pixie dust improved communication through adoption of the wiki, and brought information to the forefront of our operations. 0 0
Reduce response time: get "hooked" on a wiki Rebecca Klein
Matthew Smith
David Sierkowski
SIGUCCS English 2007 0 0
Understanding member motivation for contributing to different types of virtual communities: A proposed framework Moore T.D.
Serva M.A.
Proceedings of the 2007 ACM SIGMIS Computer Personnel Research Conference: The Global Information Technology Workforce, SIGMIS-CPR 2007 English 2007 Previous research indicates that the type and purpose of a virtual community (wiki, blog, and Internet Forum) may play a role in determining a member's motivation for contribution to a virtual community, but does not fully explore this idea. This study aggregates the disparate ideas and terminology of previous research on virtual communities and presents a more parsimonious grouping of fourteen motivational factors. These fourteen factors provide a framework for examining what drives members to contribute. Two preliminary studies offer some support for the framework. Copyright 2007 ACM. 0 0
On integrating a semantic wiki in a knowledge management system De Paoli F.
Loregian M.
CEUR Workshop Proceedings English 2006 The use of knowledge management systems is often hampered by the heavy overload for publishing information. In particular, uploading a document and then profiling it with a set of meta-data and keywords is a tedious and time-consuming activity. Therefore, one of the main goals for such systems should be to make publishing of explicit knowledge as natural as possible. In the project described in this paper, we exploit a semantic wiki editor to support document publishing by means of textual descriptions augmented by ontology-defined annotations. Such annotations are then managed as entries in metadata profiles. Moreover, we can publish semantic-wiki-based documents that do not require any further activity to be profiled and included in a knowledge base as they are self-describing. The semantic wiki project is part of a collaborative knowledge management system that has been developed to support project teams and communities of interest. 0 0
Reusing ontological background knowledge in semantic wikis Vrandecic D.
Krotzsch M.
CEUR Workshop Proceedings English 2006 A number of approaches have been developed for combining wikis with semantic technologies. Many semantic wikis focus on enabling users to specify properties and relationships of individual elements. Complex schema information is typically not edited by the wiki user. Nevertheless, semantic wikis could benefit from taking existing schema information into account, and to allow users to specify additional information based on this schema. In this paper, we introduce an extension of Semantic MediaWiki that incorporates schema information from existing OWL ontologies. Based on the imported ontology, the system offers automatic classification of articles and aims at supporting the user in editing the wiki knowledge base in a logically consistent manner. We present our prototype implementation which uses the KAON2 ontology management system to integrate reasoning services into our wiki. 0 0
Building collaborative knowledge bases: An open source approach using wiki software in teaching and research Mindel J.L.
Verma S.
Association for Information Systems - 11th Americas Conference on Information Systems, AMCIS 2005: A Conference on a Human Scale English 2005 [No abstract available] 0 0