Linked data

From WikiPapers
Jump to: navigation, search

linked data is included as keyword or extra keyword in 1 datasets, 0 tools and 42 publications.

Datasets

Dataset Size Language Description
DBpedia Catalan
German
Greek
Spanish
French
Galician
Hungarian
Italian
Dutch
Polish
Portuguese
Russian
Slovenian
Turkish
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the web to Wikipedia data.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Using linked data to mine RDF from Wikipedia's tables Munoz E.
Hogan A.
Mileo A.
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%. 0 0
Discovering missing semantic relations between entities in Wikipedia Xu M.
Zhe Wang
Bie R.
Jing-Woei Li
Zheng C.
Ke W.
Zhou M.
Lecture Notes in Computer Science English 2013 Wikipedia's infoboxes contain rich structured information of various entities, which have been explored by the DBpedia project to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedia's instances. However, quite a few hyperlinks have not been anotated by editors in infoboxes, which causes lots of relations between entities being missing in Wikipedia. In this paper, we propose an approach for automatically discovering the missing entity links in Wikipedia's infoboxes, so that the missing semantic relations between entities can be established. Our approach first identifies entity mentions in the given infoboxes, and then computes several features to estimate the possibilities that a given attribute value might link to a candidate entity. A learning model is used to obtain the weights of different features, and predict the destination entity for each attribute value. We evaluated our approach on the English Wikipedia data, the experimental results show that our approach can effectively find the missing relations between entities, and it significantly outperforms the baseline methods in terms of both precision and recall. 0 0
Extraction of linked data triples from japanese wikipedia text of ukiyo-e painters Kimura F.
Mitsui K.
Maeda A.
Proceedings - 2013 International Conference on Culture and Computing, Culture and Computing 2013 English 2013 DBpedia provides Linked Data extracted from info boxes in Wikipedia articles. Extraction is easier from an infobox than from text because an info box has a fixed-format table to represent structured information. To provide more Linked Data, we propose a method for Linked Data triple extraction from Wikipedia text. In this study, we conducted an experiment to extract Linked Data triples from Wikipedia text of ukiyo-e painters and achieved precision of 0.605. 0 0
Filling the gaps among DBpedia multilingual chapters for question answering Cojan J.
Cabrio E.
Fabien Gandon
Proceedings of the 3rd Annual ACM Web Science Conference, WebSci 2013 English 2013 To publish information extracted from multilingual pages of Wikipedia in a structured way, the Semantic Web community has started an effort of internationalization of DBpe-dia. Multilingual chapters of DBpedia can in fact contain different information with respect to the English version, in particular they provide more specificity on certain topics, or fill information gaps. DBpedia multilingual chapters are well connected through instance interlinking, extracted from Wikipedia. An alignment between properties is also carried out by DBpedia contributors as a mapping from the terms used in Wikipedia to a common ontology, enabling the exploitation of information coming from the multilingual chapters of DBpedia. However, the mapping process is currently incomplete, it is time consuming since it is manually performed, and may lead to the introduction of redundant terms in the ontology, as it becomes difficult to navigate through the existing vocabulary. In this paper we propose an approach to automatically extend the existing alignments, and we integrate it in a question answering system over linked data. We report on experiments carried out applying the QAKiS (Question Answering wiKiframework-based) system on the English and French DBpedia chapters, and we show that the use of such approach broadens its coverage. Copyright 2013 ACM. 0 0
Finding relevant missing references in learning courses Siehndel P.
Kawase R.
Hadgu A.T.
Herder E.
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 Reference sites play an increasingly important role in learning processes. Teachers use these sites in order to identify topics that should be covered by a course or a lecture. Learners visit online encyclopedias and dictionaries to find alternative explanations of concepts, to learn more about a topic, or to better understand the context of a concept. Ideally, a course or lecture should cover all key concepts of the topic that it encompasses, but often time constraints prevent complete coverage. In this paper, we propose an approach to identify missing references and key concepts in a corpus of educational lectures. For this purpose, we link concepts in educational material to the organizational and linking structure ofWikipedia. Identifying missing resources enables learners to improve their understanding of a topic, and allows teachers to investigate whether their learning material covers all necessary concepts. 0 0
Semantic message passing for generating linked data from tables Mulwad V.
Tim Finin
Joshi A.
Lecture Notes in Computer Science English 2013 We describe work on automatically inferring the intended meaning of tables and representing it as RDF linked data, making it available for improving search, interoperability and integration. We present implementation details of a joint inference module that uses knowledge from the linked open data (LOD) cloud to jointly infer the semantics of column headers, table cell values (e.g., strings and numbers) and relations between columns. We also implement a novel Semantic Message Passing algorithm which uses LOD knowledge to improve existing message passing schemes. We evaluate our implemented techniques on tables from the Web and Wikipedia. 0 0
A Linked Data platform for mining software repositories Keivanloo I.
Forbes C.
Hmood A.
Erfani M.
Neal C.
Peristerakis G.
Rilling J.
IEEE International Working Conference on Mining Software Repositories English 2012 The mining of software repositories involves the extraction of both basic and value-added information from existing software repositories. The repositories will be mined to extract facts by different stakeholders (e.g. researchers, managers) and for various purposes. To avoid unnecessary pre-processing and analysis steps, sharing and integration of both basic and value-added facts are needed. In this research, we introduce SeCold, an open and collaborative platform for sharing software datasets. SeCold provides the first online software ecosystem Linked Data platform that supports data extraction and on-the-fly inter-dataset integration from major version control, issue tracking, and quality evaluation systems. In its first release, the dataset contains about two billion facts, such as source code statements, software licenses, and code clones from 18 000 software projects. In its second release the SeCold project will contain additional facts mined from issue trackers and versioning systems. Our approach is based on the same fundamental principle as Wikipedia: researchers and tool developers share analysis results obtained from their tools by publishing them as part of the SeCold portal and therefore make them an integrated part of the global knowledge domain. The SeCold project is an official member of the Linked Data dataset cloud and is currently the eighth largest online dataset available on the Web. 0 0
Building a large scale knowledge base from Chinese Wiki Encyclopedia Zhe Wang
Jing-Woei Li
Pan J.Z.
Lecture Notes in Computer Science English 2012 DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search. 0 0
DBpedia ontology enrichment for inconsistency detection Topper G.
Knuth M.
Sack H.
ACM International Conference Proceeding Series English 2012 In recent years the Web of Data experiences an extraordinary development: an increasing amount of Linked Data is available on the World Wide Web (WWW) and new use cases are emerging continually. However, the provided data is only valuable if it is accurate and without contradictions. One essential part of the Web of Data is DBpedia, which covers the structured data of Wikipedia. Due to its automatic extraction based on Wikipedia resources that have been created by various contributors, DBpedia data often is error-prone. In order to enable the detection of inconsistencies this work focuses on the enrichment of the DBpedia ontology by statistical methods. Taken the enriched ontology as a basis the process of the extraction of Wikipedia data is adapted, in a way that inconsistencies are detected during the extraction. The creation of suitable correction suggestions should encourage users to solve existing errors and thus create a knowledge base of higher quality. Copyright 2012 ACM. 0 0
Design and Evaluation of an IR-Benchmark for SPARQL Queries with Fulltext Conditions Mishra A.
Gurajada S.
Martin Theobald
International Conference on Information and Knowledge Management, Proceedings English 2012 In this paper, we describe our goals in introducing a new, annotated benchmark collection, with which we aim to bridge the gap between the fundamentally different aspects that are involved in querying both structured and unstructured data. This semantically rich collection, captured in a unified XML format, combines components (unstructured text, semistructured infoboxes, and category structure) from 3.1 Million Wikipedia articles with highly structured RDF properties from both DBpedia and YAGO2. The new collection serves as the basis of the INEX 2012 Ad-hoc, Faceted Search, and Jeopardy retrieval tasks. With a focus on the new Jeopardy task, we particularly motivate the usage of the collection for question-answering (QA) style retrieval settings, which we also exemplify by introducing a set of 90 QA-style benchmark queries which come shipped in a SPARQL-based query format that has been extended by fulltext filter conditions. 0 0
English-to-traditional Chinese cross-lingual link discovery in articles with wikipedia corpus Chen L.-P.
Shih Y.-L.
Chen C.-T.
Ku T.
Hsieh W.-T.
Chiu H.-S.
Yang R.-D.
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing, ROCLING 2012 English 2012 In this paper, we design a processing flow to produce linked data in articles, providing anchor-based term's additional information and related terms in different languages (English to Chinese). Wikipedia has been a very important corpus and knowledge bank. Although Wikipedia describes itself not a dictionary or encyclopedia, it is if high potential values in applications and data mining researches. Link discovery is a useful IR application, based on Data Mining and NLP algorithms and has been used in several fields. According to the results of our experiment, this method does make the result has improved. 0 0
Extraction of historical events from Wikipedia Hienert D.
Luciano F.
CEUR Workshop Proceedings English 2012 The DBpedia project extracts structured information from Wikipedia and makes it available on the web. Information is gathered mainly with the help of infoboxes that contain structured information of the Wikipedia article. A lot of information is only contained in the article body and is not yet included in DBpedia. In this paper we focus on the extraction of historical events from Wikipedia articles that are available for about 2,500 years for different languages. We have extracted about 121,000 events with more than 325,000 links to DBpedia entities and provide access to this data via a Web API, SPARQL endpoint, Linked Data Interface and in a timeline application. 0 0
Linking folksonomies to knowledge organization systems Jakob Voss Communications in Computer and Information Science English 2012 This paper demonstrates enrichment of set-model folksonomies with hierarchical links and mappings to other knowledge organization systems. The process is exemplified with social tagging practice in Wikipedia and in Stack Exchange. The extended folksonomies are created by crowdsourcing tag names and descriptions to translate them to linked data in SKOS. 0 0
MapXplore: Linked data in the app store Veres C. CEUR Workshop Proceedings English 2012 MapXplore is an attempt to build a mainstream, useful, and easy to use application that uses linked data at its core. As such, it will be one of very few such apps on the Apple, or for that matter any of the mobile app stores. The purpose of the application is to allow users to browse any part of the globe, and identify points of interest from DBPedia articles. They can then drill down into traditional as well as linked data sources to get a comprehensive view of the points of interest. We note some difficulties in working with current linked data resources, and suggest some methods to help pave a future rich in popular, easy to use mobile semantic applications. By using these guidelines we aim to keep refining MapXplore to make it a showcase application for the power of linked data.,. 0 0
Models for efficient semantic data storage demonstrated on concrete example of DBpedia Lasek I.
Vojtas P.
CEUR Workshop Proceedings English 2012 In this paper, we introduce a benchmark to test efficiency of RDF data model for data storage and querying in relation to a concrete dataset.We created Czech DBpedia - a freely available dataset composed of data extracted from Czech Wikipedia. But during creation and querying of this dataset, we faced problems caused by a lack of performance of used RDF storage. We designed metrics to measure efficiency of data storage approaches. Our metric quantifies the impact of data decomposition in RDF triples. Results of our benchmark applied to the dataset of Czech DBpedia are presented. 0 0
Online sharing and integration of results from mining software repositories Keivanloo I. Proceedings - International Conference on Software Engineering English 2012 The mining of software repository involves the extraction of both basic and value-added information from existing software repositories. Depending on stakeholders (e.g., researchers, management), these repositories are mined several times for different application purposes. To avoid unnecessary pre-processing steps and improve productivity, sharing, and integration of extracted facts and results are needed. The motivation of this research is to introduce a novel collaborative sharing platform for software datasets that supports on-the-fly inter-datasets integration. We want to facilitate and promote a paradigm shift in the source code analysis domain, similar to the one by Wikipedia in the knowledge-sharing domain. In this paper, we present the SeCold project, which is the first online, publicly available software ecosystem Linked Data dataset. As part of this research, not only theoretical background on how to publish such datasets is provided, but also the actual dataset. SeCold contains about two billion facts, such as source code statements, software licenses, and code clones from over 18.000 software projects. SeCold is also an official member of the Linked Data cloud and one of the eight largest online Linked Data datasets available on the cloud. 0 0
Publishing statistical data on the web Salas P.E.R.
Marcel Martin
Mota F.M.D.
Sören Auer
Breitman K.
Casanova M.A.
Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012 English 2012 Statistical data is one of the most important sources of information, relevant for large numbers of stakeholders in the governmental, scientific and business domains alike. In this article, we overview how statistical data can be managed on the Web. With OLAP2 Data Cube and CSV2 Data Cube we present two complementary approaches on how to extract and publish statistical data. We also discuss the linking, repair and the visualization of statistical data. As a comprehensive use case, we report on the extraction and publishing on the Web of statistical data describing 10 years of life in Brazil. 0 0
Semantic CMS and wikis as platforms for linked learning Bratsas C.
Bamidis P.
Dimou A.
Antoniou I.
Ioannidis L.
CEUR Workshop Proceedings English 2012 Although interoperability has always been a priority in e-learning, conventional Learning Management Systems are mostly geared towards the Standards for Learning Objects exchange and the integration among systems. The contingency for integration with other web applications and data is hardly foreseen. This prevents them, nowadays, from being flexible to adapt to the Linked Data standards emergence and the advent of Semantic Web in general, unless they radically change orientation. In contrast, Wikis, followed by Content Management Systems, proved to be more versatile in complying with the Semantic Web and Linked Data standards. These advancements, together with their modular architecture, turn Wikis and CMSs into a decent choice for modern e-learning solutions. MediaWiki and Drupal were customized and deployed in the Aristotle University of Thessaloniki to assess their potential in exposing the University's learning resources on the Web of Linked Data, in accordance with the Linked Universities Initiative. On the occasion of these two deployments, a thorough comparison of their platforms' potentials to function as Learning Management Systems took place and is presented on this paper. 0 0
Validation and discovery of genotype-phenotype associations in chronic diseases using linked data Pathak J.
Kiefer R.
Freimuth R.
Chute C.
Studies in Health Technology and Informatics English 2012 This study investigates federated SPARQL queries over Linked Open Data (LOD) in the Semantic Web to validate existing, and potentially discover new genotype-phenotype associations from public datasets. In particular, we report our preliminary findings for identifying such associations for commonly occurring chronic diseases using the Online Mendelian Inheritance in Man (OMIM) and Database for SNPs (dbSNP) within the LOD knowledgebase and compare them with Gene Wiki for coverage and completeness. Our results indicate that Semantic Web technologies can play an important role for in-silico identification of novel disease-gene-SNP associations, although additional verification is required before such information can be applied and used effectively. © 2012 European Federation for Medical Informatics and IOS Press. All rights reserved. 0 0
Various approaches to text representation for named entity disambiguation Laek I.
Vojta P.
ACM International Conference Proceeding Series English 2012 In this paper, we focus on the problem of named entity disambiguation. We disambiguate named entities on a very detailed level. To each entity is assigned a concrete identifier of a corresponding Wikipedia article describing the entity. For such a fine grained disambiguation a correct representation of a context is crucial. We compare various context representations: bag of words representation, linguistic representation and structured co-occurrence representation of the context. Models for each representation are described and evaluated. 0 0
Wikidata: A new platform for collaborative data collection Vrandecic D. WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion English 2012 This year, Wikimedia starts to build a new platform for the collaborative acquisition and maintenance of structured data: Wikidata. Wikidata's prime purpose is to be used within the other Wikimedia projects, like Wikipedia, to provide well-maintained, high-quality data. The nature and requirements of the Wikimedia projects require to develop a few novel, or at least unusual features for Wikidata: Wikidata will be a secondary database, i.e. instead of containing facts it will contain references for facts. It will be fully internationalized. It will contain inconsistent and contradictory facts, in order to represent the diversity of knowledge about a given entity. Copyright is held by the author/owner(s). 0 0
Wikidata: a new platform for collaborative data collection Denny Vrandečić International conference companion on World Wide Web English 2012 This year, Wikimedia starts to build a new platform for the collaborative acquisition and maintenance of structured data: Wikidata. Wikidata's prime purpose is to be used within the other Wikimedia projects, like Wikipedia, to provide well-maintained, high-quality data. The nature and requirements of the Wikimedia projects require to develop a few novel, or at least unusual features for Wikidata: Wikidata will be a secondary database, i.e. instead of containing facts it will contain references for facts. It will be fully internationalized. It will contain inconsistent and contradictory facts, in order to represent the diversity of knowledge about a given entity. 0 0
A-R-E: The author-review-execute environment Muller W.
Rojas I.
Eberhart A.
Peter Haase
Schmidt M.
Procedia Computer Science English 2011 The Author-Review-Execute (A-R-E) is an innovative concept to offer under a single principle and platform an environment to support the life cycle of an (executable) paper; namely the authoring of the paper, its submission, the reviewing process, the author's revisions, its publication, and finally the study (reading/interaction) of the paper as well as extensions (follow ups) of the paper. It combines Semantic Wiki technology, a resolver that solves links both between parts of documents to executable code or to data, an anonymizing component to support the authoring and reviewing tasks, and web services providing link perennity. 0 0
Applying and extending semantic wikis for semantic web courses Rutledge L.
Oostenrijk R.
CEUR Workshop Proceedings English 2011 This work describes the application of semantic wikis in distant learning for Semantic Web courses. The resulting system focuses its application of existing and new wiki technology in making a wiki-based interface that demonstrates Semantic Web features. A new layer of wiki technology, called "OWL Wiki Forms" is introduced for this Semantic Web functionality in the wiki interface. This new functionality includes a form-based interface for editing Semantic Web ontologies. The wiki then includes appropriate data from these ontologies to extend existing wiki RDF export. It also includes ontology-driven creation of data entry and browsing interfaces for the wiki itself. As a wiki, the system provides the student an educational tool that students can use anywhere while still sharing access with the instructor and, optionally, other students. 0 0
Concept disambiguation exploiting semantic databases Hossucu A.G.
Ayyildiz H.
Gokturk Z.O.
Proceedings of the International Workshop on Semantic Web Information Management, SWIM 2011 English 2011 This paper presents a novel approach for resolving ambiguities in concepts that already reside in semantic databases such as Freebase and DBpedia. Different from standard dictionaries and lexical databases, semantic databases provide a rich hierarchy of semantic relations in ontological structures. Our disambiguation approach decides on the implied sense by computing concept similarity measures as a function of semantic relations defined in ontological graph representation of concepts. Our similarity measures also utilize Wikipedia descriptions of concepts. We performed a preliminary experimental evaluation, measuring disambiguation success rate and its correlation with input text content. The results show that our method outperforms well-known disambiguation methods. 0 0
Creating and Exploiting a Hybrid Knowledge Base for Linked Data Zareen Syed
Tim Finin
Communications in Computer and Information Science English 2011 Twenty years ago Tim Berners-Lee proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer programs to share and understand structured and semi-structured information easily. We will review the evolution of the idea and technologies to realize a Web of Data and describe how we are exploiting them to enhance information retrieval and information extraction. A key resource in our work is Wikitology, a hybrid knowledge base of structured and unstructured information extracted from Wikipedia. 0 0
DBpedia Spotlight: Shedding Light on the Web of Documents Pablo N. Mendes
Max Jakob
Andrés García-Silva
Christian Bizer
International Conference on Semantic Systems English 2011 0 0
Educational semantic wikis in the linked data age: The case of msc web science program at aristotle university of thessaloniki Bratsas C.
Dimou A.
Alexiadis G.
Chrysou D.-E.
Kavargyris K.
Parapontis I.
Bamidis P.
Antoniou I.
CEUR Workshop Proceedings English 2011 Wikis are nowadays a mature technology and further well established as successful eLearning approaches that promote collaboration, fulfill the requirements of new trends in education and follow the theory of constructivism. Semantic Wikis on the other hand, are not yet thoroughly explored, but differentiate by offering an increased overall added value to the educational procedure and the course management. Their recent integration with the Linked Data cloud exhibits a potential to exceed their usual contribution and to render them into powerful eLearning tools as they expand their potentialities to the newly created educational LOD. Web Science Semantic Wiki constitutes a prime attempt to evaluate this potential and the benefits that Semantic Web and linked data bring in the field of education. 0 0
Leveraging community-built knowledge for type coercion in question answering Kalyanpur A.
Murdock J.W.
Fan J.
Welty C.
Lecture Notes in Computer Science English 2011 Watson, the winner of the Jeopardy! challenge, is a state-of-the-art open-domain Question Answering system that tackles the fundamental issue of answer typing by using a novel type coercion (TyCor) framework, where candidate answers are initially produced without considering type information, and subsequent stages check whether the candidate can be coerced into the expected answer type. In this paper, we provide a high-level overview of the TyCor framework and discuss how it is integrated in Watson, focusing on and evaluating three TyCor components that leverage the community built semi-structured and structured knowledge resources - DBpedia (in conjunction with the YAGO ontology), Wikipedia Categories and Lists. These resources complement each other well in terms of precision and granularity of type information, and through links to Wikipedia, provide coverage for a large set of instances. 0 0
Modelling Provenance of DBpedia Resources Using Wikipedia Contributions Fabrizio Orlandi
Alexandre Passant
Web Semantics: Science, Services and Agents on the World Wide Web English 2011 DBpedia is one of the largest datasets in the Linked Open Data cloud. Its centrality and its cross-domain nature makes it one of the most important and most referred to knowledge bases on the Web of Data, generally used as a reference for data interlinking.Yet, in spite of its authoritative aspect, there is no work so far tackling the provenance aspect of DBpedia statements. By being extracted from Wikipedia, an open and collaborative encyclopedia, delivering provenance information about it would help to ensure trustworthiness of its data, a major need for people using DBpedia data for building applications.To overcome this problem, we propose an approach for modelling and managing provenance on DBpedia using Wikipedia edits, and making this information available on the Web of Data. In this paper, we describe the framework that we implemented to do so, consisting in (1) a lightweight modelling solution to semantically represent provenance of both DBpedia resources and Wikipedia content, along with mappings to popular ontologies such as the W7 — what, when, where, how, who, which, and why — and OPM — Open Provenance Model — models, (2) an information extraction process and a provenance-computation system combining Wikipedia articles’ history with DBpedia information, (3) a set of scripts to make provenance information about DBpedia statements directly available when browsing this source, as well as being publicly exposed in RDF for letting software agents consume it. 0 0
Modelling provenance of DBpedia resources using Wikipedia contributions Fabrizio Orlandi
Alexandre Passant
Journal of Web Semantics English 2011 DBpedia is one of the largest datasets in the linked Open Data cloud. Its centrality and its cross-domain nature makes it one of the most important and most referred to knowledge bases on the Web of Data, generally used as a reference for data interlinking. Yet, in spite of its authoritative aspect, there is no work so far tackling the provenance aspect of DBpedia statements. By being extracted from Wikipedia, an open and collaborative encyclopedia, delivering provenance information about it would help to ensure trustworthiness of its data, a major need for people using DBpedia data for building applications. To overcome this problem, we propose an approach for modelling and managing provenance on DBpedia using Wikipedia edits, and making this information available on the Web of Data. In this paper, we describe the framework that we implemented to do so, consisting in (1) a lightweight modelling solution to semantically represent provenance of both DBpedia resources and Wikipedia content, along with mappings to popular ontologies such as the W7 - what, when, where, how, who, which, and why - and OPM - open provenance model - models, (2) an information extraction process and a provenance-computation system combining Wikipedia articles' history with DBpedia information, (3) a set of scripts to make provenance information about DBpedia statements directly available when browsing this source, as well as being publicly exposed in RDF for letting software agents consume it. © 2011 Elsevier B.V. 0 0
Multipedia: Enriching DBpedia with multimedia information Garcia-Silva A.
Max Jakob
Mendes P.N.
Christian Bizer
KCAP 2011 - Proceedings of the 2011 Knowledge Capture Conference English 2011 Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%. 0 0
Extending SMW+ with a linked data integration framework Christian Becker
Christian Bizer
Maike Erdmann
Greaves M.
CEUR Workshop Proceedings English 2010 In this paper, we present a project which extends a SMW+ semantic wiki with a Linked Data Integration Framework that performs Web data access, vocabulary mapping, identity resolution, and quality evaluation of Linked Data. As a result, a large collection of neurogenomicsrelevant data from the Web can be flexibly transformed into a unified ontology, allowing unified querying, navigation, and visualization; as well as support for wiki-style collaboration, crowdsourcing, and commentary on chosen data sets. 0 0
Extracting structured information from wikipedia articles to populate infoboxes Lange D.
Bohm C.
Naumann F.
International Conference on Information and Knowledge Management, Proceedings English 2010 Roughly every third Wikipedia article contains an infobox - a table that displays important facts about the subject in attribute-value form. The schema of an infobox, i.e., the attributes that can be expressed for a concept, is defined by an infobox template. Often, authors do not specify all template attributes, resulting in incomplete infoboxes. With iPopulator, we introduce a system that automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text. In contrast to prior work, iPopulator detects and exploits the structure of attribute values to independently extract value parts. We have tested iPopulator on the entire set of infobox templates and provide a detailed analysis of its effectiveness. For instance, we achieve an average extraction precision of 91% for 1,727 distinct infobox template attributes. 0 0
Semantic search on heterogeneous Wiki systems Fabrizio Orlandi
Alexandre Passant
WikiSym English 2010 0 1
Talking about data: Sharing richly structured information through blogs and wikis Benson E.
Marcus A.
Howahl F.
Karger D.
Proceedings of the 19th International Conference on World Wide Web, WWW '10 English 2010 The web has dramatically enhanced people's ability to communicate ideas, knowledge, and opinions. But the authoring tools that most people understand, blogs and wikis, primarily guide users toward authoring text. In this work, we show that substantial gains in expressivity and communication would accrue if people could easily share richly structured information in meaningful visualizations. We then describe several extensions we have created for blogs and wikis that enable users to publish, share, and aggregate such structured information using the same workflows they apply to text. In particular, we aim to preserve those attributes that make blogs and wikis so effective: one-click access to the information, one-click publishing of content, natural authoring interfaces, and the ability to easily copy-and-paste information and visualizations from other sources. 0 0
Using linked data to interpret tables Mulwad V.
Tim Finin
Zareen Syed
Joshi A.
CEUR Workshop Proceedings English 2010 Vast amounts of information is available in structured forms like spreadsheets, database relations, and tables found in documents and on the Web. We describe an approach that uses linked data to interpret such tables and associate their components with nodes in a reference linked data collection. Our proposed framework assigns a class (i.e. type) to table columns, links table cells to entities, and inferred relations between columns to properties. The resulting interpretation can be used to annotate tables, confirm existing facts in the linked data collection, and propose new facts to be added. Our implemented prototype uses DBpedia as the linked data collection and Wikitology for background knowledge. We evaluated its performance using a collection of tables from Google Squared, Wikipedia and the Web. 0 0
DBpedia – A Crystallization Point for the Web of Data Christian Bizer
Jens Lehmann
Georgi Kobilarov
Sören Auer
Christian Becker
Richard Cyganiak
Sebastian Hellmann
Journal of Web Semantics: Science, Services and Agents on the World Wide Web English 2009 The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich RDF description of the entity, including human-readable definitions in 30 languages, relationships to other resources, classifications in four concept hierarchies, various facts as well as data-level links to other Web data sources describing the entity. Over the last year, an increasing number of data publishers have begun to set data-level links to DBpedia resources, making DBpedia a central interlinking hub for the emerging Web of data. Currently, the Web of interlinked data sources around DBpedia provides approximately 4.7 billion pieces of information and covers domains such as geographic information, people, companies, films, music, genes, drugs, books, and scientific publications. This article describes the extraction of the DBpedia knowledge base, the current status of interlinking DBpedia with other data sources on the Web, and gives an overview of applications that facilitate the Web of Data around DBpedia. 0 0
Enabling cross-wikis integration by extending the SIOC ontology Fabrizio Orlandi
Alexandre Passant
CEUR Workshop Proceedings English 2009 This paper describes how we extended the SIOC ontology to take into account particular aspects of wikis in order to enable integration capabilities between various wiki systems. In particular, we will overview the proposed extensions and detail a webservice providing SIOC data from any MediaWiki instance, as well as related query examples that show how different wikis, designed as independant data silos, can be uniformally queried and interlinked. 0 0
Is there anything worth finding on the Semantic Web? Halpin H. WWW'09 - Proceedings of the 18th International World Wide Web Conference English 2009 There has recently been an upsurge of interest in the possibilities of combining structured data and ad-hoc information retrieval from traditional hypertext. In this experiment, we run queries extracted from a query log of a major search engine against the Semantic Web to discover if the Semantic Web has anything of interest to the average user. We show that there is indeed much information on the Semantic Web that could be relevant for many queries for people, places and even abstract concepts, although they are overwhelmingly clustered around a Semantic Web-enabled export of Wikipedia known as DBPedia. Copyright is held by the author/owner(s). 0 0
Towards an interlinked semantic wiki farm Alexandre Passant
Laublet P.
CEUR Workshop Proceedings English 2008 This paper details the main concepts and the architecture of UfoWiki, a semantic wiki farm - i.e. a server of wikis - that uses form-based templates to produce ontology-based knowledge. Moreover, the system allows different wikis to share and interlink ontology instance between each other, so that knowledge can be produced by different and distinct communities in a distributed but collaborative way. 0 0
ZLinks: Semantic framework for invoking contextual linked data Bergman M.K.
Giasson F.
CEUR Workshop Proceedings English 2008 This first-ever demonstration of the new zLinks plug-in shows how any existing Web document link can be automatically transformed into a portal to relevant Linked Data. Each existing link disambiguates to its contextual and relevant subject concept (SC) or named entity (NE). The SCs are grounded in the OpenCyc knowledge base, supplemented by aliases and WordNet synsets to aid disambiguation. The NEs are drawn from Wikipedia as processed via YAGO, and other online fact-based repositories. The UMBEL ontology basis to this framework offers significant further advantages. The zLinks popup is invoked only as desired via unobtrusive user interface cues. 0 0