Shojiro Nishio

From WikiPapers
Jump to: navigation, search

Shojiro Nishio is an author.


Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Calculating Wikipedia article similarity using machine translation evaluation metrics Bilingual dictionary
Cross-language Document Similarity
Data mining
Proceedings - 25th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2011 English 2011 Calculating the similarity of Wikipedia articles in different languages is helpful for bilingual dictionary construction and various other research areas. However, standard methods for document similarity calculation are usually very simple. Therefore, we describe an approach of translating one Wikipedia article into the language of the other article, and then calculating article similarity with standard machine translation evaluation metrics. An experiment revealed that our approach is effective for identifying Wikipedia articles in different languages that are covering the same concept. 0 0
Semantic relatedness measurement based on Wikipedia link co-occurrence analysis Controlled language construction
Data mining
International Journal of Web Information Systems English 2011 Purpose: Recently, the importance and effectiveness of Wikipedia Mining has been shown in several researches. One popular research area on Wikipedia Mining focuses on semantic relatedness measurement, and research in this area has shown that Wikipedia can be used for semantic relatedness measurement. However, previous methods are facing two problems; accuracy and scalability. To solve these problems, the purpose of this paper is to propose an efficient semantic relatedness measurement method that leverages global statistical information of Wikipedia. Furthermore, a new test collection is constructed based on Wikipedia concepts for evaluating semantic relatedness measurement methods. Design/methodology/approach: The authors' approach leverages global statistical information of the whole Wikipedia to compute semantic relatedness among concepts (disambiguated terms) by analyzing co-occurrences of link pairs in all Wikipedia articles. In Wikipedia, an article represents a concept and a link to another article represents a semantic relation between these two concepts. Thus, the co-occurrence of a link pair indicates the relatedness of a concept pair. Furthermore, the authors propose an integration method with tfidf as an improved method to additionally leverage local information in an article. Besides, for constructing a new test collection, the authors select a large number of concepts from Wikipedia. The relatedness of these concepts is judged by human test subjects. Findings: An experiment was conducted for evaluating calculation cost and accuracy of each method. The experimental results show that the calculation cost ofthis approachisvery low compared toone of the previous methods and more accurate than all previous methods for computing semantic relatedness. Originality/value: This is the first proposal of co-occurrence analysis of Wikipedia links for semantic relatedness measurement. The authors show that this approach is effective to measure semantic relatedness among concepts regarding calculation cost and accuracy. The findings may be useful to researchers who are interested in knowledge extraction, as well as ontology researches. 0 0
Wikipedia Sets: Context-Oriented Related Entity Acquisition from Multiple Words Data mining
Association thesaurus
Context dependency
WI-IAT English 2011 0 0
Wikipedia sets: Context-oriented related entity acquisition from multiple words Association thesaurus
Context dependency
Data mining
Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 English 2011 In this paper, we propose a method which acquires related words (entities) from multiple words by naturally disambiguating their meaning and considering their contexts. In addition, we introduce a bootstrapping method for improving the coverage of association relations. Experimental result shows that our method can acquire related words depending on the contexts of multiple words compared to the ESA-based method. 0 0
Relation extraction between related concepts by combining Wikipedia and web information for Japanese language Natural Language Processing
Lecture Notes in Computer Science English 2010 Construction of a huge scale ontology covering many named entities, domain-specific terms and relations among these concepts is one of the essential technologies in the next generation Web based on semantics. Recently, a number of studies have proposed automated ontology construction methods using the wide coverage of concepts in Wikipedia. However, since they tried to extract formal relations such as is-a and a-part-of relations, generated ontologies have only a narrow coverage of the relations among concepts. In this work, we aim at automated ontology construction with a wide coverage of both concepts and these relations by combining information on the Web with Wikipedia. We propose a relation extraction method which receives pairs of co-related concepts from an association thesaurus extracted from Wikipedia and extracts their relations from the Web. 0 0
Concept vector extraction from Wikipedia category network Wikipedia
Concept vector
Web mining
ICUIMC English 2009 0 0
Improving the extraction of bilingual terminology from Wikipedia Bilingual dictionary
Data mining
Link analysis
ACM Trans. Multimedia Comput. Commun. Appl. English 2009 Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an {SVM} classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs. 2009 {ACM. 0 0
Wikipedia Relatedness Measurement Methods and Influential Features WAINA English 2009 0 0
A bilingual dictionary extracted from the Wikipedia link structure DASFAA English 2008 0 0
A search engine for browsing the Wikipedia thesaurus DASFAA English 2008 0 0
An approach for extracting bilingual terminology from Wikipedia DASFAA English 2008 0 1
Association thesaurus construction methods based on link co-occurrence analysis for Wikipedia English 2008 Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. However, we still need scalable methods to analyze the huge number of Web pages and hyperlinks among articles in the Web based encyclopedia. 0 0
Constructing a global ontology by concept mapping using Wikipedia thesaurus Proceedings - International Conference on Advanced Information Networking and Applications, AINA English 2008 Recently, the importance of semantics on the WWW is widely recognized and a lot of semantic information (RDF, OWL etc.) is being built/published on the WWW. However, the lack of ontology mappings becomes a serious problem for the Semantic Web since it needs well defined relations to retrieve information correctly by inferring the meaning of information. One to one mapping is not an efficient method due to the nature of distributed environment. Therefore, it would be a considerable method to map the concepts by using a large-scale intermediate ontology. On the other hand, Wikipedia is a large-scale of concept network covering almost all concepts in the real world. In this paper, we propose an intermediate ontology construction method using Wikipedia Thesaurus, an association thesaurus extracted from Wikipedia. Since Wikipedia Thesaurus provides associated concepts without explicit relation type, we propose an approach of concept mapping using two sub methods; "name mapping" and "logic-based mapping". 0 0
Wikipedia Mining for Huge Scale Japanese Association Thesaurus Construction Data mining
Association Thesaurus
Link Structure Mining
AINAW English 2008 0 0
Wikipedia Mining: Wikipedia as a Corpus por Knowledge Extraction Wikimania English 2008 Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers a huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. As a corpus for knowledge extraction, Wikipedia's impressive characteristics are not limited to the scale, but also include the dense link structure, word sense disambiguation based on URL and brief anchor texts. Because of these characteristics, Wikipedia has become a promising corpus and a big frontier for researchers. A considerable number of researches on Wikipedia Mining such as semantic relatedness measurement, bilingual dictionary construction, and ontology construction have been conducted. In this paper, we take a comprehensive, panoramic view of Wikipedia as a Web corpus since almost all previous researches are just exploiting parts of the Wikipedia characteristics. The contribution of this paper is triple-sum. First, we unveil the characteristics of Wikipedia as a corpus for knowledge extraction in detail. In particular, we describe the importance of anchor texts with special emphasis since it is helpful information for both disambiguation and synonym extraction. Second, we introduce some of our Wikipedia mining researches as well as researches conducted by other researches in order to prove the worth of Wikipedia. Finally, we discuss possible directions of Wikipedia research. 0 0
Wikipedia link structure and text mining for semantic relation extraction towards a huge scale global web ontology CEUR Workshop Proceedings English 2008 Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. Since it is becoming a database storing all human knowledge, Wikipedia mining is a promising approach that bridges the Semantic Web and the Social Web (a. k. a. Web 2.0). In fact, in the previous researches on Wikipedia mining, it is strongly proved that Wikipedia has a remarkable capability as a corpus for knowledge extraction, especially for relatedness measurement among concepts. However, semantic relatedness is just a numerical strength of a relation but does not have an explicit relation type. To extract inferable semantic relations with explicit relation types, we need to analyze not only the link structure but also texts in Wikipedia. In this paper, we propose a consistent approach of semantic relation extraction from Wikipedia. The method consists of three sub-processes highly optimized for Wikipedia mining; 1) fast preprocessing, 2) POS (Part Of Speech) tag tree analysis, and 3) mainstay extraction. Furthermore, our detailed evaluation proved that link structure mining improves both the accuracy and the scalability of semantic relations extraction. 0 0
Wikipedia mining for huge scale Japanese association thesaurus construction Proceedings - International Conference on Advanced Information Networking and Applications, AINA English 2008 Wikipedia, a huge scale Web-based dictionary, is an impressive corpus for knowledge extraction. We already proved that Wikipedia can be used for constructing an English association thesaurus and our link structure mining method is significantly effective for this aim. However, we want to find out how we can apply this method to other languages and what the requirements, differences and characteristics are. Nowadays, Wikipedia supports more than 250 languages such as English, German, French, Polish and Japanese. Among Asian languages, the Japanese Wikipedia is the largest corpus in Wikipedia. In this research, therefore, we analyzed all Japanese articles in Wikipedia and constructed a huge scale Japanese association thesaurus. After constructing the thesaurus, we realized that it shows several impressive characteristics depending on language and culture. 0 0
Wikipedia Mining for an Association Web Thesaurus Construction Dblp
Thesaurus wikipedia
Web Information Systems Engineering (WISE) 2007 France 2007 Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency - Inversed Backward link Frequency) and the extension method “forward / backward link weighting (FB weighting)” in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF. 0 0
Wikipedia mining for an association web thesaurus construction WISE English 2007 0 0