Jun'ichi Kazama

From WikiPapers
Jump to: navigation, search

Jun'ichi Kazama is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Generating information-rich taxonomy from wikipedia 2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings English 2010 Even though hyponymy relation acquisition has been extensively studied, "how informative such acquired hyponymy relations are" has not been sufficiently discussed. We found that the hypernyms in automatically acquired hyponymy relations were often too vague or ambiguous to specify the meaning of their hyponyms. For instance, hypernym work is vague and ambiguous in hyponymy relations work/Avatar and work/The Catcher in the Rye. In this paper, we propose a simple method of generating intermediate concepts of hyponymy relations that can make such (vague) hypernyms more specific. Our method generates such an information-rich hyponymy relation as work / work by film director / work by James Cameron / Avatar from the less informative relation work/Avatar. Furthermore, the generated relation work by film director/Avatar can be paraphrased into a new relation movie/Avatar. Experiments showed that our method successfully acquired 2,719,441 enriched hyponymy relations with one intermediate concept with 0.853 precision and another 6,347,472 hyponymy relations with 0.786 precision. 0 0
Hypernym discovery based on distributional similarity and hierarchical structures EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 English 2009 This paper presents a new method of developing a large-scale hyponymy relation database by combining Wikipedia and other Web documents. We attach new words to the hyponymy database extracted from Wikipedia by using distributional similarity calculated from documents on the Web. For a given target word, our algorithm first finds k similar words from the Wikipedia database. Then, the hypernyms of these k similar words are assigned scores by considering the distributional similarities and hierarchical distances in the Wikipedia database. Finally, new hyponymy relations are output according to the scores. In this paper, we tested two distributional similarities. One is based on raw verbnoun dependencies (which we call "RVD"), and the other is based on a large-scale clustering of verb-noun dependencies (called "CVD"). Our method achieved an attachment accuracy of 91.0% for the top 10,000 relations, and an attachment accuracy of 74.5% for the top 100,000 relations when using CVD. This was a far better outcome compared to the other baseline approaches. Excluding the region that had very high scores, CVD was found to be more effective than RVD. We also confirmed that most relations extracted by our method cannot be extracted merely by applying the well-known lexico-syntactic patterns to Web documents. 0 0
Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia Web mining
Wikipedia
Cross-Language Links
Language resources
WI-IAT English 2008 0 1
Enriching multilingual language resources by discovering missing cross-language links in Wikipedia Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008 English 2008 We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia. 0 1
Exploitingwikipedia as external knowledge for named entity recognition EMNLP-CoNLL 2007 - Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning English 2007 We explore the use of Wikipedia as external knowledge to improve named entity recognition (NER). Our method retrieves the corresponding Wikipedia entry for each candidate word sequence and extracts a category label from the first sentence of the entry, which can be thought of as a definition part. These category labels are used as features in a CRF-based NE tagger. We demonstrate using the CoNLL 2003 dataset that the Wikipedia category labels extracted by such a simple method actually improve the accuracy of NER. 0 0