Knowledge Extraction

From WikiPapers
(Redirected from Knowledge extraction)
Jump to: navigation, search

knowledge extraction is included as keyword or extra keyword in 0 datasets, 0 tools and 23 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Improving web search results with explanation-aware snippets: An experimental study Wira-Alam A.
Zloch M.
WEBIST 2013 - Proceedings of the 9th International Conference on Web Information Systems and Technologies English 2013 In this paper, we focus on a typical task on a web search, in which users want to discover the coherency between two concepts on the Web. In our point of view, this task can be seen as a retrieval process: starting with some source information, the goal is to find target information by following hyperlinks. Given two concepts, e.g. chemistry and gunpowder, are search engines able to find the coherency and explain it? In this paper, we introduce a novel way of linking two concepts by following paths of hyperlinks and collecting short text snippets. We implemented a proof-of-concept prototype, which extracts paths and snippets from Wikipedia articles. Our goal is to provide the user with an overview about the coherency, enriching the connection with a short but meaningful description. In our experimental study, we compare the results of our approach with the capability of web search engines. The results show that 72% of the participants find ours better than these of web search engines. Copyright 0 0
Representation and verification of attribute knowledge Zhang C.
Niu Z.
Shi C.
Tan M.
Fu H.
Xu S.
Lecture Notes in Computer Science English 2013 With the increasing growth and popularization of the Internet, knowledge extraction from the web is an important issue in the fields of web mining, ontology engineering and intelligent information processing. The availability of real big corpora and the development of technologies of internet network and machine learning make it feasible to acquire massive knowledge from the web. In addition, many web-based encyclopedias such as Wikipedia and Baidu Baike include much structured knowledge. However, knowledge qualities including the incorrectness, inconsistency, and incompleteness become a serious obstacle for the wide practical applications of those extracted and structured knowledge. In this paper, we build a taxonomy of relations between attributes of concepts, and propose a taxonomy of attribute relations driven approach to evaluating the knowledge about attribute values of attributes of entities. We also address an application of our approach to building and verifying attribute knowledge of entities in different domains. 0 0
A knowledge-extraction approach to identify and present verbatim quotes in free text Paass G.
Bergholz A.
Pilz A.
ACM International Conference Proceeding Series English 2012 In news stories verbatim quotes of persons play a very important role, as they carry reliable information about the opinion of that person concerning specific aspects. As thousands of new quotes are published every hour it is very difficult to keep track of them. In this paper we describe a set of algorithms to solve the knowledge management problem of identifying, storing and accessing verbatim quotes. We handle the verbatim quote task as a relation extraction problem from unstructured text. Using a workflow of knowledge extraction algorithms we provide the required features for the relation extraction algorithm. The central relation extraction procedures is trained using manually annotated documents. It turns out that structural grammatical information is able to improve the F-vale for verbatim quote detection to 84.1%, which is sufficient for many exploratory applications. We present the results in a smartphone app connected to a web server, which employs a number of algorithms like linkage to Wikipedia, topics extraction and search engine indices to provide a flexible access to the extracted verbatim quotes. 0 0
Approach for building ontology automatically based on Wikipedia Wu T.
Xiao K.
Tan X.
ICIC Express Letters, Part B: Applications English 2012 Building ontology is groundwork of many web 2.0 applications. As one of the most important public knowledge bases, Wikipedia has a lot of comparative advantages in the research field. In this paper, we propose a new method for extracting domain-oriented semantic knowledge from Wikipedia. During the process, every category in domain is assigned a weight, so that we can calculate the score of articles. As a result, a light ontology of software domain is built automatically with the semantic knowledge. Besides, the semantic knowledge is evaluated manually. 0 0
DBpedia and the live extraction of structured data from Wikipedia Morsey M.
Janette Lehmann
Sören Auer
Claus Stadler
Sebastian Hellmann
Program English 2012 Purpose: DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues. Design/methodology/approach: Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors. Findings: During the realization of DBpedia-Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications: DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value: The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication. 0 0
Domain-oriented semantic knowledge extraction Xiao K.
Li B.
Tan X.
Journal of Computational Information Systems English 2012 Semantic knowledge extraction task is groundwork of ontology building. As one of the most important public knowledge bases, Wikipedia has a lot of comparative advantages in the research field. In this paper, we propose a new method for extracting domain-oriented semantic knowledge from Wikipedia. During the process, every category in domain is assigned a weight, so that we can calculate the score of articles. Besides, practical experience in storing and utilizing big data of Wikipedia are detailed in this paper too. 0 0
Mining Wikipedia's snippets graph: First step to build a new knowledge base Wira-Alam A.
Mathiak B.
CEUR Workshop Proceedings English 2012 In this paper, we discuss the aspects of mining links and text snippets from Wikipedia as a new knowledge base. Current knowledge base, e.g. DBPedia[1], covers mainly the structured part of Wikipedia, but not the content as a whole. Acting as a complement, we focus on extracting information from the text of the articles. We extract a database of the hyperlinks between Wikipedia articles and populate them with the textual context surrounding each hyperlink. This would be useful for network analysis, e.g. to measure the influence of one topic on another, or for question-answering directly (for stating the relationship between two entities). First, we describe the technical parts related to extracting the data from Wikipedia. Second, we specify how to represent the data extracted as an extended triple through a Web service. Finally, we discuss the usage possibilities upon our expectation and also the challenges. 0 0
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval Ye Z.
Huang J.X.
He B.
Hong Lin
Journal of the American Society for Information Science and Technology English 2012 Wikipedia is characterized by its dense link structure and a large number of articles in different languages, which make it a notable Web corpus for knowledge extraction and mining, in particular for mining the multilingual associations. In this paper, motivated by a psychological theory of word meaning, we propose a graph-based approach to constructing a cross-language association dictionary (CLAD) from Wikipedia, which can be used in a variety of cross-language accessing and processing applications. In order to evaluate the quality of the mined CLAD, and to demonstrate how the mined CLAD can be used in practice, we explore two different applications of the mined CLAD to cross-language information retrieval (CLIR). First, we use the mined CLAD to conduct cross-language query expansion; and, second, we use it to filter out translation candidates with low translation probabilities. Experimental results on a variety of standard CLIR test collections show that the CLIR retrieval performance can be substantially improved with the above two applications of CLAD, which indicates that the mined CLAD is of sound quality. 0 0
Automatic knowledge extraction from manufacturing research publications Boonyasopon P.
Riel A.
Uys W.
Louw L.
Tichkiewitch S.
Du Preez N.
CIRP Annals - Manufacturing Technology English 2011 Knowledge mining is a young and rapidly growing discipline aiming at automatically identifying valuable knowledge in digital documents. This paper presents the results of a study of the application of document retrieval and text mining techniques to extract knowledge from CIRP research papers. The target is to find out if and how such tools can help researchers to find relevant publications in a cluster of papers and increase the citation indices their own papers. Two different approaches to automatic topic identification are investigated. One is based on Latent Dirichlet Allocation of a huge document set, the other uses Wikipedia to discover significant words in papers. The study uses a combination of both approaches to propose a new approach to efficient and intelligent knowledge mining. 0 0
Extracting events from Wikipedia as RDF triples linked to widespread semantic web datasets Carlo Aliprandi
Francesco Ronzano
Andrea Marchetti
Maurizio Tesconi
Salvatore Minutoli
Lecture Notes in Computer Science English 2011 Many attempts have been made to extract structured data from Web resources, exposing them as RDF triples and interlinking them with other RDF datasets: in this way it is possible to create clouds of highly integrated Semantic Web data collections. In this paper we describe an approach to enhance the extraction of semantic contents from unstructured textual documents, in particular considering Wikipedia articles and focusing on event mining. Starting from the deep parsing of a set of English Wikipedia articles, we produce a semantic annotation compliant with the Knowledge Annotation Format (KAF). We extract events from the KAF semantic annotation and then we structure each event as a set of RDF triples linked to both DBpedia and WordNet. We point out examples of automatically mined events, providing some general evaluation of how our approach may discover new events and link them to existing contents. 0 0
Extracting events from wikipedia as RDF triples linked to widespread semantic web datasets Carlo Aliprandi
Francesco Ronzano
Andrea Marchetti
Maurizio Tesconi
Salvatore Minutoli
OCSC English 2011 0 0
Semantic relatedness measurement based on Wikipedia link co-occurrence analysis Masahiro Ito
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
International Journal of Web Information Systems English 2011 Purpose: Recently, the importance and effectiveness of Wikipedia Mining has been shown in several researches. One popular research area on Wikipedia Mining focuses on semantic relatedness measurement, and research in this area has shown that Wikipedia can be used for semantic relatedness measurement. However, previous methods are facing two problems; accuracy and scalability. To solve these problems, the purpose of this paper is to propose an efficient semantic relatedness measurement method that leverages global statistical information of Wikipedia. Furthermore, a new test collection is constructed based on Wikipedia concepts for evaluating semantic relatedness measurement methods. Design/methodology/approach: The authors' approach leverages global statistical information of the whole Wikipedia to compute semantic relatedness among concepts (disambiguated terms) by analyzing co-occurrences of link pairs in all Wikipedia articles. In Wikipedia, an article represents a concept and a link to another article represents a semantic relation between these two concepts. Thus, the co-occurrence of a link pair indicates the relatedness of a concept pair. Furthermore, the authors propose an integration method with tfidf as an improved method to additionally leverage local information in an article. Besides, for constructing a new test collection, the authors select a large number of concepts from Wikipedia. The relatedness of these concepts is judged by human test subjects. Findings: An experiment was conducted for evaluating calculation cost and accuracy of each method. The experimental results show that the calculation cost ofthis approachisvery low compared toone of the previous methods and more accurate than all previous methods for computing semantic relatedness. Originality/value: This is the first proposal of co-occurrence analysis of Wikipedia links for semantic relatedness measurement. The authors show that this approach is effective to measure semantic relatedness among concepts regarding calculation cost and accuracy. The findings may be useful to researchers who are interested in knowledge extraction, as well as ontology researches. 0 0
Toward a semantic vocabulary for systems engineering Di Maio P. ACM International Conference Proceeding Series English 2011 The web can be the most efficient medium for sharing knowledge, provided appropriate technological artifacts such as controlled vocabularies and metadata are adopted. In our research we study the degree of such adoption applied to the systems engineering domain. This paper is a work in progress report discussing issues surrounding knowledge extraction and representation, proposing an integrated approach to tackle various challenges associated with the development of a shared vocabulary for the practice. 0 0
An automatic acquisition of domain knowledge from list-structrued text in baidu encyclopedia Wu W.
Liu T.
Hu H.
Du X.
2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings English 2010 We propose a novel method which can automatically extract new concepts and semantic relations between concepts, in order to support the domain ontology evolvement. We collect the corpus from a free Chinese encyclopedia called Baidu encyclopedia, which is similar to Wikipedia. We locate lists from the Baidu encyclopedia, and extract domain knowledge from the lists. Further more, we use a knowledge assessor to ensure the validity of extracted knowledge. In the experiments, we make a practical attempt to evolve the Chinese Law Ontology (CLO V0), and show that our method can improve the completeness and coverage of CLO V0. 0 0
DBpedia – A Crystallization Point for the Web of Data Christian Bizer
Jens Lehmann
Georgi Kobilarov
Sören Auer
Christian Becker
Richard Cyganiak
Sebastian Hellmann
Journal of Web Semantics: Science, Services and Agents on the World Wide Web English 2009 The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich RDF description of the entity, including human-readable definitions in 30 languages, relationships to other resources, classifications in four concept hierarchies, various facts as well as data-level links to other Web data sources describing the entity. Over the last year, an increasing number of data publishers have begun to set data-level links to DBpedia resources, making DBpedia a central interlinking hub for the emerging Web of data. Currently, the Web of interlinked data sources around DBpedia provides approximately 4.7 billion pieces of information and covers domains such as geographic information, people, companies, films, music, genes, drugs, books, and scientific publications. This article describes the extraction of the DBpedia knowledge base, the current status of interlinking DBpedia with other data sources on the Web, and gives an overview of applications that facilitate the Web of Data around DBpedia. 0 0
Mining concepts from Wikipedia for ontology construction Gaoying Cui
Lu Q.
Li W.
Yirong Chen
Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009 English 2009 An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5% makes it an effective approach to mine concepts from Wikipedia for ontology construction. 0 0
Unsupervised knowledge extraction for taxonomies of concepts from Wikipedia Barbu E.
Poesio M.
International Conference Recent Advances in Natural Language Processing, RANLP English 2009 A novel method for unsupervised acquisition of knowledge for taxonomies of concepts from raw Wikipedia text is presented. We assume that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The concepts in 6 taxonomies extracted from WordNet are mapped onto Wikipedia pages and the lexico-syntactic patterns describing semantic structures expressing relevant knowledge for the concepts are automatically learnt. 0 0
Weblogs as a source for extracting general world knowledge Gordon J.
Van Durme B.
Schubert L.
K-CAP'09 - Proceedings of the 5th International Conference on Knowledge Capture English 2009 Knowledge extraction (KE) efforts have often used corpora of heavily edited writing and sources written to provide the desired knowledge (e.g., newspapers or textbooks). However, the proliferation of diverse, up-to-date, unedited writing on the Web, especially in weblogs, offers new challenges for KE tools. We describe our efforts to extract general knowledge implicit in this noisy data and examine whether such sources can be an adequate substitute for resources like Wikipedia. 0 0
Extracting structured knowledge for Semantic Web by mining Wikipedia Kotaro Nakayama CEUR Workshop Proceedings English 2008 Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia's impressive characteristics are not limited to the scale, but also include the dense link structure, URI for word sense disambiguation, well structured Infoboxes, and the category tree. One of the popular approaches in Wikipedia Mining is to use Wikipedia's category tree as an ontology and a number of researchers proved that Wikipedia's categories are promising resources for ontology construction by showing significant results. In this work, we try to prove the capability of Wikipedia as a corpus for knowledge extraction and how it works in the Semantic Web environment. We show two achievements; Wikipedia Thesaurus, a huge scale association thesaurus by mining the Wikipedia's link structure, and Wikipedia Ontology, a Web ontology extracted by mining Wikipedia articles. 0 0
Meliorated approach for extracting Bilingual terminology from wikipedia Ajay Gupta
Goyal A.
Bindal A.
Proceedings of 11th International Conference on Computer and Information Technology, ICCIT 2008 English 2008 With the demand of accurate and domain specific bilingual dictionaries, research in the field of automatic dictionary extraction has become popular. Due to lack of domain specific terminology in parallel corpora, extraction of bilingual terminology from Wikipedia (a corpus for knowledge extraction having a huge amount of articles, links within different languages, a dense link structure and a number of redirect pages) has taken up a new research in the field of bilingual dictionary creation. Our method not only analyzes interlanguage links along with redirect page titles and linktext titles but also filters out inaccurate translation candidates using pattern matching. Score of each translation candidate is calculated using page parameters and then setting an appropriate threshold as compared to previous approach, which was solely, based on backward links. In our experiment, we proved the advantages of our approach compared to the traditional approach. 0 0
Wikipedia link structure and text mining for semantic relation extraction towards a huge scale global web ontology Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
CEUR Workshop Proceedings English 2008 Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. Since it is becoming a database storing all human knowledge, Wikipedia mining is a promising approach that bridges the Semantic Web and the Social Web (a. k. a. Web 2.0). In fact, in the previous researches on Wikipedia mining, it is strongly proved that Wikipedia has a remarkable capability as a corpus for knowledge extraction, especially for relatedness measurement among concepts. However, semantic relatedness is just a numerical strength of a relation but does not have an explicit relation type. To extract inferable semantic relations with explicit relation types, we need to analyze not only the link structure but also texts in Wikipedia. In this paper, we propose a consistent approach of semantic relation extraction from Wikipedia. The method consists of three sub-processes highly optimized for Wikipedia mining; 1) fast preprocessing, 2) POS (Part Of Speech) tag tree analysis, and 3) mainstay extraction. Furthermore, our detailed evaluation proved that link structure mining improves both the accuracy and the scalability of semantic relations extraction. 0 0
Wikipedia mining for huge scale Japanese association thesaurus construction Kotaro Nakayama
Masahiro Ito
Takahiro Hara
Shojiro Nishio
Proceedings - International Conference on Advanced Information Networking and Applications, AINA English 2008 Wikipedia, a huge scale Web-based dictionary, is an impressive corpus for knowledge extraction. We already proved that Wikipedia can be used for constructing an English association thesaurus and our link structure mining method is significantly effective for this aim. However, we want to find out how we can apply this method to other languages and what the requirements, differences and characteristics are. Nowadays, Wikipedia supports more than 250 languages such as English, German, French, Polish and Japanese. Among Asian languages, the Japanese Wikipedia is the largest corpus in Wikipedia. In this research, therefore, we analyzed all Japanese articles in Wikipedia and constructed a huge scale Japanese association thesaurus. After constructing the thesaurus, we realized that it shows several impressive characteristics depending on language and culture. 0 0
Wikipedia mining for triple extraction enhanced by co-reference resolution Kotaro Nakayama CEUR Workshop Proceedings English 2008 Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia's impressive characteristics are not limited to the scale, but also include the dense link structure, URI for word sense disambiguation, well structured Infoboxes, and the category tree. In previous researches on this area, the category tree has been widely used to extract semantic relations among concepts on Wikipedia. In this paper, we try to extract triples (Subject, Predicate, Object) from Wikipedia articles, another promising resource for knowledge extraction. We propose a practical method which integrates link structure mining and parsing to enhance the extraction accuracy. The proposed method consists of two technical novelties; two parsing strategies and a co-reference resolution method. 0 0