Multi-modal

From WikiPapers
Jump to: navigation, search

Multi-modal is included as keyword or extra keyword in 0 datasets, 0 tools and 13 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Evaluation of WikiTalk - User studies of human-robot interaction Anastasiou D.
Kristiina Jokinen
Graham Wilcock
Lecture Notes in Computer Science English 2013 The paper concerns the evaluation of Nao WikiTalk, an application that enables a Nao robot to serve as a spoken open-domain knowledge access system. With Nao WikiTalk the robot can talk about any topic the user is interested in, using Wikipedia as its knowledge source. The robot suggests some topics to start with, and the user shifts to related topics by speaking their names after the robot mentions them. The user can also switch to a totally new topic by spelling the first few letters. As well as speaking, the robot uses gestures, nods and other multimodal signals to enable clear and rich interaction. The paper describes the setup of the user studies and reports on the evaluation of the application, based on various factors reported by the 12 users who participated. The study compared the users' expectations of the robot interaction with their actual experience of the interaction. We found that the users were impressed by the lively appearance and natural gesturing of the robot, although in many respects they had higher expectations regarding the robot's presentation capabilities. However, the results are positive enough to encourage research on these lines. 0 0
Search in WikiImages using mobile phone Havasi L.
Szabo M.
Pataki M.
Varga D.
Sziranyi T.
Kovacs L.
Proceedings - International Workshop on Content-Based Multimedia Indexing English 2013 Demonstration will focus on the content based retrieval of Wikipedia images (Hungarian version). A mobile application for iOS will be used to gather images and send directly to the crossmodal processing framework. Searching is implemented in a high performance hybrid index tree with total 500k entries. The hit list is converted to wikipages and ordered by the content based score. 0 0
Cross domain search by exploiting Wikipedia Che-Hung Liu
Wu S.
Jiang S.
Tung A.K.H.
Proceedings - International Conference on Data Engineering English 2012 The abundance of Web 2.0 resources in various media formats calls for better resource integration to enrich user experience. This naturally leads to a new cross-modal resource search requirement, in which a query is a resource in one modal and the results are closely related resources in other modalities. With cross-modal search, we can better exploit existing resources. Tags associated with Web 2.0 resources are intuitive medium to link resources with different modality together. However, tagging is by nature an ad hoc activity. They often contain noises and are affected by the subjective inclination of the tagger. Consequently, linking resources simply by tags will not be reliable. In this paper, we propose an approach for linking tagged resources to concepts extracted from Wikipedia, which has become a fairly reliable reference over the last few years. Compared to the tags, the concepts are therefore of higher quality. We develop effective methods for cross-modal search based on the concepts associated with resources. Extensive experiments were conducted, and the results show that our solution achieves good performance. 0 0
Cross-modal information retrieval - A case study on Chinese wikipedia Cong Y.
Qin Z.
Jian Yu
Wan T.
Lecture Notes in Computer Science English 2012 Probability models have been used in cross-modalmultimedia information retrieval recently by building conjunctive models bridging the text and image components. Previous studies have shown that cross-modal information retrieval systemusing the topic correlation model (TCM) outperforms state-of-the-art models in English corpus. In this paper, we will focus on the Chinese language, which is different from western languages composed by alphabets. Words and characters will be chosen as the basic structural units of Chinese, respectively. We also set up a test database, named Ch-Wikipedia, in which documents with paired image and text are extracted fromChinese website ofWikipedia.We investigate the problems of retrieving texts (ranked by semantic closeness) given an image query, and vice versa. The capabilities of the TCM model is verified by experiments across the Ch-Wikipedia dataset. 0 0
Cross-modal topic correlations for multimedia retrieval Jian Yu
Cong Y.
Qin Z.
Wan T.
Proceedings - International Conference on Pattern Recognition English 2012 In this paper, we propose a novel approach for cross-modal multimedia retrieval by jointly modeling the text and image components of multimedia documents. In this model, the image component is represented by local SIFT descriptors based on the bag-of-feature model. The text component is represented by a topic distribution learned from latent topic models such as latent Dirichlet allocation (LDA). The latent semantic relations between texts and images can be reflected by correlations between the word topics and topics of image features. A statistical correlation model conditioned on category information is investigated. Experimental results on a benchmark Wikipedia dataset show that the newly proposed approach outperforms state-of-the-art cross-modal multimedia retrieval systems. 0 0
Managing multimodal and multilingual semantic content Marcel Martin
Gerber D.
Heino N.
Sören Auer
Ermilov T.
WEBIST 2011 - Proceedings of the 7th International Conference on Web Information Systems and Technologies English 2011 With the advent and increasing popularity of Semantic Wikis and the Linked Data the management of se-mantically represented knowledge became mainstream. However, certain categories of semantically enriched content, such as multimodal documents as well as multilingual textual resources are still difficult to handle. In this paper, we present a comprehensive strategy for managing the life-cycle of both multimodal and multilingual semantically enriched content. The strategy is based on extending a number of semantic knowledge management techniques such as authoring, versioning, evolution, access and exploration for semantically enriched multimodal and multilingual content. We showcase an implementation and user interface based on the semantic wiki paradigm and present a use case from the e-tourism domain. 0 0
Maximum covariance unfolding: Manifold learning for bimodal data Mahadevan V.
Wong C.W.
Pereira J.C.
Liu T.T.
Vasconcelos N.
Saul L.K.
Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 English 2011 We propose maximum covariance unfolding (MCU), a manifold learning algorithm for simultaneous dimensionality reduction of data from different input modalities. Given high dimensional inputs from two different but naturally aligned sources, MCU computes a common low dimensional embedding that maximizes the cross-modal (inter-source) correlations while preserving the local (intra-source) distances. In this paper, we explore two applications of MCU. First we use MCU to analyze EEG-fMRI data, where an important goal is to visualize the fMRI voxels that are most strongly correlated with changes in EEG traces. To perform this visualization, we augment MCU with an additional step for metric learning in the high dimensional voxel space. Second, we use MCU to perform cross-modal retrieval of matched image and text samples from Wikipedia. To manage large applications of MCU, we develop a fast implementation based on ideas from spectral graph theory. These ideas transform the original problem for MCU, one of semidefinite programming, into a simpler problem in semidefinite quadratic linear programming. 0 0
Crew: cross-modal resource searching by exploiting Wikipedia Chen Liu
Beng C. Ooi
Anthony K. H. Tung
Dongxiang Zhang
English 2010 In Web 2.0, users have generated and shared massive amounts of resources in various media formats, such as news, blogs, audios, photos and videos. The abundance and diversity of the resources call for better integration to improve the accessibility. A straightforward approach is to link the resources via tags so that resources from different modals sharing the same tag can be connected as a graph structure. This naturally motivates a new kind of information retrieval system, named cross-modal resource search, in which given a query object from any modal, all the related resources from other modals can be retrieved in a convenient manner. However, due to the tag homonym and synonym, such an approach returns results of low quality because resources with the same tag but not semantically related will be directly connected as well. In this paper, we propose to build the resource graph and perform query processing by exploiting Wikipedia. We construct a concept middle-ware between the layer of tags and resources to fully capture the semantic meaning of the resources. Such a cross-modal search system based on Wikipedia, named Crew, is built and demonstrates promising search results. 0 0
Multimodal image retrieval over a large database Myoupo D.
Adrian Popescu
Le Borgne H.
Moellic P.-A.
Lecture Notes in Computer Science English 2010 We introduce a new multimodal retrieval technique which combines query reformulation and visual image reranking in order to deal with results sparsity and imprecision, respectively. Textual queries are reformulated using Wikipedia knowledge and results are then reordered using a k-NN based reranking method. We compare textual and multimodal retrieval and show that introducing visual reranking results in a significant improvement of performance. 0 0
Towards community discovery in signed collaborative interaction networks Bogdanov P.
Larusso N.D.
Amit Singh
Proceedings - IEEE International Conference on Data Mining, ICDM English 2010 We propose a framework for discovery of collaborative community structure in Wiki-based knowledge repositories based on raw-content generation analysis. We leverage topic modelling in order to capture agreement and opposition of contributors and analyze these multi-modal relations to map communities in the contributor base. The key steps of our approach include (i) modeling of pair wise variable-strength contributor interactions that can be both positive and negative, (ii) synthesis of a global network incorporating all pair wise interactions, and (iii) detection and analysis of community structure encoded in such networks. The global community discovery algorithm we propose outperforms existing alternatives in identifying coherent clusters according to objective optimality criteria. Analysis of the discovered community structure reveals coalitions of common interest editors who back each other in promoting some topics and collectively oppose other coalitions or single authors. We couple contributor interactions with content evolution and reveal the global picture of opposing themes within the self-regulated community base for both controversial and featured articles in Wikipedia. 0 0
Comprehensive query-dependent fusion using regression-on-folksonomies: A case study of multimodal music search Bin Zhang
Xiang Q.
Lu H.
Shen J.
Yafang Wang
MM'09 - Proceedings of the 2009 ACM Multimedia Conference, with Co-located Workshops and Symposiums English 2009 The combination of heterogeneous knowledge sources has been widely regarded as an effective approach to boost retrieval accuracy in many information retrieval domains. While various technologies have been recently developed for information retrieval, multimodal music search has not kept pace with the enormous growth of data on the Internet. In this paper, we study the problem of integrating multiple online information sources to conduct effective query dependent fusion (QDF) of multiple search experts for music retrieval. We have developed a novel framework to construct a knowledge space of users' information need from online folksonomy data. With this innovation, a large number of comprehensive queries can be automatically constructed to train a better generalized QDF system against unseen user queries. In addition, our framework models QDF problem by regression of the optimal combination strategy on a query. Distinguished from the previous approaches, the regression model of QDF (RQDF) offers superior modeling capability with less constraints and more efficient computation. To validate our approach, a large scale test collection has been collected from different online sources, such as Last.fm, Wikipedia, and YouTube. All test data will be released to the public for better research synergy in multimodal music search. Our performance study indicates that the accuracy, efficiency, and robustness of the multimodal music search can be improved significantly by the proposed folksonomy-RQDF approach. In addition, since no human involvement is required to collect training examples, our approach offers great feasibility and practicality in system development. Copyright 2009 ACM. 0 0
Meta-classifiers for multimodal document classification Chen S.D.
Monga V.
Moulin P.
2009 IEEE International Workshop on Multimedia Signal Processing, MMSP '09 English 2009 This paper proposes learning algorithms for the problem of multimodal document classification. Specifically, we develop classifiers that automatically assign documents to categories by exploiting features from both text as well as image content. In particular, we use meta-classifiers that combine state-of-the-art text and image based classifiers into making joint decisions. The two meta classifiers we choose are based on support vector machines and Adaboost. Experiments on real-world databases from Wikipedia demonstrate the benefits of a joint exploitation of these modalities. 0 0
Using evidences based on natural language to drive the process of fusing multimodal sources Navarro S.
Llopis F.
Munoz R.
Lecture Notes in Computer Science English 2009 This paper focuses on the proposal and evaluation of two multimodal fusion techniques in the field of Visual Information Retrieval (VIR). These proposals are based on two widely used fusion strategies in the VIR area, the multimodal blind relevance feedback and the multimodal re-ranking strategy. Unlike the existent techniques, our alternative proposals are guided by the evidence found in the natural language annotations related to the images. The results achieved by our runs in two different ImageCLEF tasks, 3rd place in the Wikipedia task [1] and 4th place within all the automatic runs in the photo task [2], jointly with the results obtained in later experiments presented in this paper show us that the use of conceptual information associated with an image can improve significantly the performance of the original multimodal fusion techniques used. 0 0