Takehito Utsuro

From WikiPapers
Jump to: navigation, search

Takehito Utsuro is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Labeling blog posts with wikipedia entries through LDA-based topic modeling of wikipedia Blogs
LDA
Topic analysis
Topic model
Wikipedia
Journal of Internet Technology English 2013 Given a search query, most existing search engines simply return a ranked list of search results. However, it is often the case that those search result documents consist of a mixture of documents that are closely related to various contents. In order to address the issue of quickly overviewing the distribution of contents, this paper proposes a framework of labeling blog posts with Wikipedia entries through LDA (latent Dirichlet allocation) based topic modeling of Wikipedia. One of the most important advantages of this LDA-based document model is that the collected Wikipedia entries and their LDA parameters heavily depend on the distribution of keywords across all the search result of blog posts. This tendency actually contributes to quickly overviewing the search result of blog posts through the LDA-based topic distribution. We show that the LDA-based document retrieval scheme outperforms our previous approach. Finally, we compare the proposed approach to the standard LDA-based topic modeling without Wikipedia knowledge source. Both LDA-based topic modeling results have quite different nature and contribute to quickly overviewing the search result of blog posts in a quite complementary fashion. 0 0
LDA-based topic modeling in labeling blog posts with wikipedia entries Blogs
LDA
Topic Analysis
Topic Model
Wikipedia
Lecture Notes in Computer Science English 2012 Given a search query, most existing search engines simply return a ranked list of search results. However, it is often the case that those search result documents consist of a mixture of documents that are closely related to various contents. In order to address the issue of quickly overviewing the distribution of contents, this paper proposes a framework of labeling blog posts with Wikipedia entries through LDA (latent Dirichlet allocation) based topic modeling. More specifically, this paper applies an LDA-based document model to the task of labelling blog posts with Wikipedia entries. One of the most important advantages of this LDA-based document model is that the collected Wikipedia entries and their LDA parameters heavily depend on the distribution of keywords across all the search result of blog posts. This tendency actually contributes to quickly overviewing the search result of blog posts through the LDA-based topic distribution. In the evaluation of the paper, we also show that the LDA-based document retrieval scheme outperforms our previous approach. 0 0
Utilizing wikipedia in categorizing topic related blogs into facets Blogs
Faceted Search
Search Engine
Topic Analysis
Wikipedia
Procedia - Social and Behavioral Sciences English 2011 Given a search query, most existing search engines simply return a ranked list of search results. However, it is often the case that those search result documents consist of a mixture of documents that are closely related to various sub- topics. This paper proposes a framework of categorizing blog posts according to their sub-topics. In our framework, the sub-topic of each blog post is identified by utilizing Wikipedia entries as a knowledge source and each Wikipedia entry title is considered as a sub-topic label. We achieve to quickly overview the distribution of sub-topics over the whole collected blog posts. 0 0
Cross-lingual analysis of concerns and reports on crimes in blogs Blog feed retrieval
Crime reports
Cross-lingual blog analysis
Wikipedia
Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering English 2010 Among other domains and topics on which some issues are frequently argued in the blogosphere, the domain of crime is one of the most seriously discussed by various kinds of bloggers. Such information on crimes in blogs is especially valuable for outsiders from abroad who are not familiar with cultures and crimes in foreign countries. This paper proposes a framework of cross-lingually analyzing people's concerns, reports, and experiences on crimes in their own blogs. In the retrieval of blog feeds/posts, we take two approaches, focusing on various types of bloggers such as experts in the crime domain and victims of criminal acts. 0 0
Japanese/english blog distillation and cross-lingual blog analysis with multilingual wikipedia entries as fundamental knowledge source Blogs
Blog distillation
Cross-lingual blog analysis
Topic analysis
Wikipedia
Transactions of the Japanese Society for Artificial Intelligence Japanese 2010 The overall goal of this paper is to cross-lingually analyze multilingual blogs collected with a topic keyword. The framework of collecting multilingual blogs with a topic keyword is designed as the blog feed retrieval procedure. In this paper, we take an approach of collecting blog feeds rather than blog posts, mainly because we regard the former as a larger information unit in the blogosphere and prefer it as the information source for cross-lingual blog analysis. In the blog feed retrieval procedure, we also regard Wikipedia as a large scale ontological knowledge base for conceptually indexing the blogosphere. The underlying motivation of employing Wikipedia is in linking a knowledge base of well known facts and relatively neutral opinions with rather raw, user generated media like blogs, which include less well known facts and much more radical opinions. In our framework, first, in order to collect candidates of blog feeds for a given query, we use existing Web search engine APIs, which return a ranked list of blog posts, given a topic keyword. Next, we re-rank the list of blog feeds according to the number of hits of the topic keyword as well as closely related terms extracted from the Wikipedia entry in each blog feed. We compare the proposed blog feed retrieval method to existing Web search engine APIs and achieve significant improvement. We then apply the proposed blog distillation framework to the task of cross-lingually analyzing multilingual blogs collected with a topic keyword. Here, we cross-lingually and cross-culturally compare less well known facts and opinions that are closely related to a given topic. Results of cross-lingual blog analysis support the effectiveness of the proposed framework. 0 0
Linking topics of news and Blogs with Wikipedia for complementary navigation Blogs
IR
News
Topic analysis
Wikipedia
Lecture Notes in Computer Science English 2010 We study complementary navigation of news and blog, where Wikipedia entries are utilized as fundamental knowledge source for linking news articles and blog feeds/posts. In the proposed framework, given a topic as the title of a Wikipedia entry, its Wikipedia entry body text is analyzed as fundamental knowledge source for the given topic, and terms strongly related to the given topic are extracted. Those terms are then used for ranking news articles and blog posts. In the scenario of complementary navigation from a news article to closely related blog posts, Japanese Wikipedia entries are ranked according to the number of strongly related terms shared by the given news article and each Wikipedia entry. Then, top ranked 10 entries are regarded as indices for further retrieving closely related blog posts. The retrieved blog posts are finally ranked all together. The retrieved blog posts are then shown to users as blogs of personal opinions and experiences that are closely related to the given news article. In our preliminary evaluation, through an interface for manually selecting relevant Wikipedia entries, the rate of successfully retrieving relevant blog posts improved. 0 0
Linking topics of news and blogs with wikipedia for complementary navigation IR
Wikipedia
Blogs
News
Topic analysis
English 2010 0 0
Linking Wikipedia Entries to Blog Feeds by Machine Learning English 2009 0 0
Linking Wikipedia entries to blog feeds by machine learning Blog feed retrieval
Blogs
Topics
Wikipedia
ACM International Conference Proceeding Series English 2009 This paper studies the issue of conceptually indexing the blogosphere through the whole hierarchy of Wikipedia entries. This paper proposes how to link Wikipedia entries to blog feeds in the Japanese blogosphere by machine learning, where about 300,000 Wikipedia entries are used for representing a hierarchy of topics. In our experimental evaluation, we achieved over 80% precision in the task. Copyright 2009 ACM. 0 0
Mining cross-lingual/cross-cultural differences in concerns and opinions in blogs Blogs
Cross-Language Information Retrieval
Cultural gaps
Topic analysis
Wikipedia
Lecture Notes in Computer Science English 2009 The goal of this paper is to cross-lingually analyze multilingual blogs collected with a topic keyword. The framework of collecting multilingual blogs with a topic keyword is designed as the blog feed retrieval procedure. Mulitlingual queries for retrieving blog feeds are created from Wikipedia entries. Finally, we cross-lingually and cross-culturally compare less well known facts and opinions that are closely related to a given topic. Preliminary evaluation results support the effectiveness of the proposed framework. 0 0
Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy English 2009 0 0
Cross-lingual blog analysis based on multilingual blog distillation from multilingual wikipedia entries ICWSM 2008 - Proceedings of the 2nd International Conference on Weblogs and Social Media English 2008 The goal of this paper is to cross-lingually analyze multilingual blogs collected with a topic keyword. The framework of collecting multilingual blogs with a topic keyword is designed as the blog distillation (feed search) procedure. Mulitlingual queries for retrieving blog feeds are created from Wikipedia entries. Finally, we cross-lingually and cross-culturally compare less well known facts and opinions that are closely related to a given topic. Preliminary evaluation results support the effectiveness of the proposed framework. Copyright © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 0 0
Cross-lingual blog analysis by cross-lingual comparison of characteristic terms and blog posts Proceedings of the 2nd International Symposium on Universal Communication, ISUC 2008 English 2008 The goal of this paper is to cross-lingually analyze multilingual blogs collected with a topic keyword. The framework of collecting multilingual blogs with a topic keyword is designed as the blog feed retrieval procedure. Multilingual queries for retrieving blog feeds are created from Wikipedia entries. Finally, we cross-lingually and crossculturally compare less well known facts and opinions that are closely related to a given topic. Preliminary evaluation results support the effectiveness of the proposed framework. 0 0
KANSHIN: A cross-lingual concern analysis system using multilingual blog articles Proceedings - 2008 International Workshop on Information-Explosion and Next Generation Search, INGS 2008 English 2008 An architecture of cross-lingual concern analysis (CLCA) using multilingual blog articles, and its prototype system are described. As various people who are living in various countries use the Web, cross-lingual information retrieval (CLIR) plays an important role in the next generation search. In this paper, we propose a CLCA as one of CLIR applications for facilitating users to find concerns of people across languages. We propose a layer architecture of CLCA, and its prototype system called KANSHIN. The system collects Japanese, Chinese, Korean, and English blog articles, and analyzes concerns across languages. Users can find concerns from several viewpoints such as temporal, geographical, and a network of blog sites. The system also facilitates users to browse multilingual keywords using Wikipedia, and the system facilitates users to find spam blogs. An overview of the CLCA architecture and the system are described. 0 0