Kotaro Nakayama

From WikiPapers
Jump to: navigation, search

Kotaro Nakayama is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
MIGSOM: A SOM algorithm for large scale hyperlinked documents inspired by neuronal migration Clustering
Link analysis
SOM
Visualisation
Wikipedia
Lecture Notes in Computer Science English 2014 The SOM (Self Organizing Map), one of the most popular unsupervised machine learning algorithms, maps high-dimensional vectors into low-dimensional data (usually a 2-dimensional map). The SOM is widely known as a "scalable" algorithm because of its capability to handle large numbers of records. However, it is effective only when the vectors are small and dense. Although a number of studies on making the SOM scalable have been conducted, technical issues on scalability and performance for sparse high-dimensional data such as hyperlinked documents still remain. In this paper, we introduce MIGSOM, an SOM algorithm inspired by new discovery on neuronal migration. The two major advantages of MIGSOM are its scalability for sparse high-dimensional data and its clustering visualization functionality. In this paper, we describe the algorithm and implementation in detail, and show the practicality of the algorithm in several experiments. We applied MIGSOM to not only experimental data sets but also a large scale real data set: Wikipedia's hyperlink data. 0 0
A self organizing document map algorithm for large scale hyperlinked data inspired by neuronal migration Clustering
Link analysis
SOM
Visualisation
Wikipedia
Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 English 2011 Web document clustering is one of the research topics that is being pursued continuously due to the large variety of applications. Since Web documents usually have variety and diversity in terms of domains, content and quality, one of the technical difficulties is to find a reasonable number and size of clusters. In this research, we pay attention to SOMs (Self Organizing Maps) because of their capability of visualized clustering that helps users to investigate characteristics of data in detail. The SOM is widely known as a "scalable" algorithm because of its capability to handle large numbers of records. However, it is effective only when the vectors are small and dense. Although several research efforts on making the SOM scalable have been conducted, technical issues on scalability and performance for sparse high-dimensional data such as hyperlinked documents still remain. In this paper, we introduce MIGSOM, an SOM algorithm inspired by a recent discovery on neuronal migration. The two major advantages of MIGSOM are its scalability for sparse high-dimensional data and its clustering visualization functionality. In this paper, we describe the algorithm and implementation, and show the practicality of the algorithm by applying MIGSOM to a huge scale real data set: Wikipedia's hyperlink data. 0 0
Calculating Wikipedia article similarity using machine translation evaluation metrics Bilingual dictionary
Cross-language Document Similarity
Data mining
Proceedings - 25th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2011 English 2011 Calculating the similarity of Wikipedia articles in different languages is helpful for bilingual dictionary construction and various other research areas. However, standard methods for document similarity calculation are usually very simple. Therefore, we describe an approach of translating one Wikipedia article into the language of the other article, and then calculating article similarity with standard machine translation evaluation metrics. An experiment revealed that our approach is effective for identifying Wikipedia articles in different languages that are covering the same concept. 0 0
Semantic relatedness measurement based on Wikipedia link co-occurrence analysis Controlled language construction
Data mining
Semantics
Wiki
International Journal of Web Information Systems English 2011 Purpose: Recently, the importance and effectiveness of Wikipedia Mining has been shown in several researches. One popular research area on Wikipedia Mining focuses on semantic relatedness measurement, and research in this area has shown that Wikipedia can be used for semantic relatedness measurement. However, previous methods are facing two problems; accuracy and scalability. To solve these problems, the purpose of this paper is to propose an efficient semantic relatedness measurement method that leverages global statistical information of Wikipedia. Furthermore, a new test collection is constructed based on Wikipedia concepts for evaluating semantic relatedness measurement methods. Design/methodology/approach: The authors' approach leverages global statistical information of the whole Wikipedia to compute semantic relatedness among concepts (disambiguated terms) by analyzing co-occurrences of link pairs in all Wikipedia articles. In Wikipedia, an article represents a concept and a link to another article represents a semantic relation between these two concepts. Thus, the co-occurrence of a link pair indicates the relatedness of a concept pair. Furthermore, the authors propose an integration method with tfidf as an improved method to additionally leverage local information in an article. Besides, for constructing a new test collection, the authors select a large number of concepts from Wikipedia. The relatedness of these concepts is judged by human test subjects. Findings: An experiment was conducted for evaluating calculation cost and accuracy of each method. The experimental results show that the calculation cost ofthis approachisvery low compared toone of the previous methods and more accurate than all previous methods for computing semantic relatedness. Originality/value: This is the first proposal of co-occurrence analysis of Wikipedia links for semantic relatedness measurement. The authors show that this approach is effective to measure semantic relatedness among concepts regarding calculation cost and accuracy. The findings may be useful to researchers who are interested in knowledge extraction, as well as ontology researches. 0 0
Wikipedia Sets: Context-Oriented Related Entity Acquisition from Multiple Words Data mining
Association thesaurus
Context dependency
Bootstrapping
WI-IAT English 2011 0 0
Wikipedia sets: Context-oriented related entity acquisition from multiple words Association thesaurus
Bootstrapping
Context dependency
Data mining
Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 English 2011 In this paper, we propose a method which acquires related words (entities) from multiple words by naturally disambiguating their meaning and considering their contexts. In addition, we introduce a bootstrapping method for improving the coverage of association relations. Experimental result shows that our method can acquire related words depending on the contexts of multiple words compared to the ESA-based method. 0 0
Relation extraction between related concepts by combining Wikipedia and web information for Japanese language Natural Language Processing
Ontology
Thesaurus
Lecture Notes in Computer Science English 2010 Construction of a huge scale ontology covering many named entities, domain-specific terms and relations among these concepts is one of the essential technologies in the next generation Web based on semantics. Recently, a number of studies have proposed automated ontology construction methods using the wide coverage of concepts in Wikipedia. However, since they tried to extract formal relations such as is-a and a-part-of relations, generated ontologies have only a narrow coverage of the relations among concepts. In this work, we aim at automated ontology construction with a wide coverage of both concepts and these relations by combining information on the Web with Wikipedia. We propose a relation extraction method which receives pairs of co-related concepts from an association thesaurus extracted from Wikipedia and extracts their relations from the Web. 0 0
Concept vector extraction from Wikipedia category network Wikipedia
Categorization
Concept vector
Web mining
ICUIMC English 2009 0 0
Improving the extraction of bilingual terminology from Wikipedia Bilingual dictionary
Data mining
Link analysis
ACM Trans. Multimedia Comput. Commun. Appl. English 2009 Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an {SVM} classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs. 2009 {ACM. 0 0
Wikipedia Relatedness Measurement Methods and Influential Features WAINA English 2009 0 0
A Search Engine for Browsing the Wikipedia Thesaurus Data mining
Association Thesaurus
Link Structure Analysis
XML Web Services
13th International Conference on Database Systems for Advanced Applications, Demo session (DASFAA) 2008 Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In our previous work, we proposed link structure mining algorithms to extract a huge scale and accurate association thesaurus from Wikipedia. The association thesaurus covers almost 1.3 million concepts and the significant accuracy is proved in detailed experiments. To prove its practicality, we implemented three features on the association thesaurus; a search engine for browsing Wikipedia Thesaurus, an XML Web service for the thesaurus and a Semantic Web support feature. We show these features in this demonstration. 0 0
A bilingual dictionary extracted from the Wikipedia link structure DASFAA English 2008 0 0
A search engine for browsing the Wikipedia thesaurus DASFAA English 2008 0 0
An approach for extracting bilingual terminology from Wikipedia DASFAA English 2008 0 1
Association thesaurus construction methods based on link co-occurrence analysis for Wikipedia English 2008 Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. However, we still need scalable methods to analyze the huge number of Web pages and hyperlinks among articles in the Web based encyclopedia. 0 0
Constructing a Global Ontology by Concept Mapping using Wikipedia Thesaurus Data mining
Association Thesaurus
Ontology Mapping
Global Ontology
International Symposium on Mining And Web (IEEE MAW) conjunction with IEEE AINA 2008 0 0
Constructing a global ontology by concept mapping using Wikipedia thesaurus Proceedings - International Conference on Advanced Information Networking and Applications, AINA English 2008 Recently, the importance of semantics on the WWW is widely recognized and a lot of semantic information (RDF, OWL etc.) is being built/published on the WWW. However, the lack of ontology mappings becomes a serious problem for the Semantic Web since it needs well defined relations to retrieve information correctly by inferring the meaning of information. One to one mapping is not an efficient method due to the nature of distributed environment. Therefore, it would be a considerable method to map the concepts by using a large-scale intermediate ontology. On the other hand, Wikipedia is a large-scale of concept network covering almost all concepts in the real world. In this paper, we propose an intermediate ontology construction method using Wikipedia Thesaurus, an association thesaurus extracted from Wikipedia. Since Wikipedia Thesaurus provides associated concepts without explicit relation type, we propose an approach of concept mapping using two sub methods; "name mapping" and "logic-based mapping". 0 0
Extracting structured knowledge for Semantic Web by mining Wikipedia CEUR Workshop Proceedings English 2008 Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia's impressive characteristics are not limited to the scale, but also include the dense link structure, URI for word sense disambiguation, well structured Infoboxes, and the category tree. One of the popular approaches in Wikipedia Mining is to use Wikipedia's category tree as an ontology and a number of researchers proved that Wikipedia's categories are promising resources for ontology construction by showing significant results. In this work, we try to prove the capability of Wikipedia as a corpus for knowledge extraction and how it works in the Semantic Web environment. We show two achievements; Wikipedia Thesaurus, a huge scale association thesaurus by mining the Wikipedia's link structure, and Wikipedia Ontology, a Web ontology extracted by mining Wikipedia articles. 0 0
Wikipedia Mining for Huge Scale Japanese Association Thesaurus Construction Data mining
Association Thesaurus
Link Structure Mining
AINAW English 2008 0 0
Wikipedia Mining: Wikipedia as a Corpus por Knowledge Extraction Wikimania English 2008 Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers a huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. As a corpus for knowledge extraction, Wikipedia's impressive characteristics are not limited to the scale, but also include the dense link structure, word sense disambiguation based on URL and brief anchor texts. Because of these characteristics, Wikipedia has become a promising corpus and a big frontier for researchers. A considerable number of researches on Wikipedia Mining such as semantic relatedness measurement, bilingual dictionary construction, and ontology construction have been conducted. In this paper, we take a comprehensive, panoramic view of Wikipedia as a Web corpus since almost all previous researches are just exploiting parts of the Wikipedia characteristics. The contribution of this paper is triple-sum. First, we unveil the characteristics of Wikipedia as a corpus for knowledge extraction in detail. In particular, we describe the importance of anchor texts with special emphasis since it is helpful information for both disambiguation and synonym extraction. Second, we introduce some of our Wikipedia mining researches as well as researches conducted by other researches in order to prove the worth of Wikipedia. Finally, we discuss possible directions of Wikipedia research. 0 0
Wikipedia link structure and text mining for semantic relation extraction towards a huge scale global web ontology CEUR Workshop Proceedings English 2008 Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. Since it is becoming a database storing all human knowledge, Wikipedia mining is a promising approach that bridges the Semantic Web and the Social Web (a. k. a. Web 2.0). In fact, in the previous researches on Wikipedia mining, it is strongly proved that Wikipedia has a remarkable capability as a corpus for knowledge extraction, especially for relatedness measurement among concepts. However, semantic relatedness is just a numerical strength of a relation but does not have an explicit relation type. To extract inferable semantic relations with explicit relation types, we need to analyze not only the link structure but also texts in Wikipedia. In this paper, we propose a consistent approach of semantic relation extraction from Wikipedia. The method consists of three sub-processes highly optimized for Wikipedia mining; 1) fast preprocessing, 2) POS (Part Of Speech) tag tree analysis, and 3) mainstay extraction. Furthermore, our detailed evaluation proved that link structure mining improves both the accuracy and the scalability of semantic relations extraction. 0 0
Wikipedia mining for huge scale Japanese association thesaurus construction Proceedings - International Conference on Advanced Information Networking and Applications, AINA English 2008 Wikipedia, a huge scale Web-based dictionary, is an impressive corpus for knowledge extraction. We already proved that Wikipedia can be used for constructing an English association thesaurus and our link structure mining method is significantly effective for this aim. However, we want to find out how we can apply this method to other languages and what the requirements, differences and characteristics are. Nowadays, Wikipedia supports more than 250 languages such as English, German, French, Polish and Japanese. Among Asian languages, the Japanese Wikipedia is the largest corpus in Wikipedia. In this research, therefore, we analyzed all Japanese articles in Wikipedia and constructed a huge scale Japanese association thesaurus. After constructing the thesaurus, we realized that it shows several impressive characteristics depending on language and culture. 0 0
Wikipedia mining for triple extraction enhanced by co-reference resolution CEUR Workshop Proceedings English 2008 Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia's impressive characteristics are not limited to the scale, but also include the dense link structure, URI for word sense disambiguation, well structured Infoboxes, and the category tree. In previous researches on this area, the category tree has been widely used to extract semantic relations among concepts on Wikipedia. In this paper, we try to extract triples (Subject, Predicate, Object) from Wikipedia articles, another promising resource for knowledge extraction. We propose a practical method which integrates link structure mining and parsing to enhance the extraction accuracy. The proposed method consists of two technical novelties; two parsing strategies and a co-reference resolution method. 0 0
A Thesaurus Construction Method from Large Scale Web Dictionaries Data mining
Association Thesaurus
Link Structure Analysis
Link Text
Synonyms
21st IEEE International Conference on Advanced Information Networking and Applications (AINA) 2007 Web-based dictionaries, such as Wikipedia, have become dramatically popular among the internet users in past several years. The important characteristic of Web-based dictionary is not only the huge amount of articles, but also hyperlinks. Hyperlinks have various information more than just providing transfer function between pages. In this paper, we propose an efficient method to analyze the link structure of Web-based dictionaries to construct an association thesaurus. We have already applied it to Wikipedia, a huge scale Web-based dictionary which has a dense link structure, as a corpus. We developed a search engine for evaluation, then conducted a number of experiments to compare our method with other traditional methods such as co-occurrence analysis. 0 0
Wikipedia Mining for an Association Web Thesaurus Construction Dblp
Thesaurus wikipedia
Web Information Systems Engineering (WISE) 2007 France 2007 Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency - Inversed Backward link Frequency) and the extension method “forward / backward link weighting (FB weighting)” in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF. 0 0
Wikipedia mining for an association web thesaurus construction WISE English 2007 0 0