Bilingual dictionary

From WikiPapers
Jump to: navigation, search

Bilingual dictionary is included as keyword or extra keyword in 0 datasets, 0 tools and 10 publications.


There is no datasets for this keyword.


There is no tools for this keyword.


Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Calculating Wikipedia article similarity using machine translation evaluation metrics Maike Erdmann
Andrew Finch
Kotaro Nakayama
Eiichiro Sumita
Takahiro Hara
Shojiro Nishio
Proceedings - 25th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2011 English 2011 Calculating the similarity of Wikipedia articles in different languages is helpful for bilingual dictionary construction and various other research areas. However, standard methods for document similarity calculation are usually very simple. Therefore, we describe an approach of translating one Wikipedia article into the language of the other article, and then calculating article similarity with standard machine translation evaluation metrics. An experiment revealed that our approach is effective for identifying Wikipedia articles in different languages that are covering the same concept. 0 0
English-to-Korean cross-lingual link detection for Wikipedia Marigomen R.
Kang I.-S.
Communications in Computer and Information Science English 2011 In this paper, we introduce a method for automatically discovering possible links between documents in different languages. We utilized the large collection of articles in Wikipedia as our resource for keyword extraction, word sense disambiguation and in creating a bilingual dictionary. Our system runs using these set of methods for which given an English text or input document, it automatically determines important words or phrases within the context and links it to a corresponding Wikipedia article in other languages. In this system we use the Korean Wikipedia corpus as the linking document. 0 0
Ranking multilingual documents using minimal language dependent resources Santosh G.S.K.
Kiran Kumar N.
Vasudeva Varma
Lecture Notes in Computer Science English 2011 This paper proposes an approach of extracting simple and effective features that enhances multilingual document ranking (MLDR). There is limited prior research on capturing the concept of multilingual document similarity in determining the ranking of documents. However, the literature available has worked heavily with language specific tools, making them hard to reimplement for other languages. Our approach extracts various multilingual and monolingual similarity features using a basic language resource (bilingual dictionary). No language-specific tools are used, hence making this approach extensible for other languages. We used the datasets provided by Forum for Information Retrieval Evaluation (FIRE) for their 2010 Adhoc Cross-Lingual document retrieval task on Indian languages. Experiments have been performed with different ranking algorithms and their results are compared. The results obtained showcase the effectiveness of the features considered in enhancing multilingual document ranking. 0 0
Creating a Wikipedia-based Persian-English word association dictionary Rahimi Z.
Shakery A.
2010 5th International Symposium on Telecommunications, IST 2010 English 2010 One of the most important issues in cross language information retrieval is how to cross the language barrier between the query and the documents. Different translation resources have been studied for this purpose. In this research, we study using Wikipedia for query translation by constructing a Wikipedia-based bilingual association dictionary. We use English and Persian Wikipedia inter-language links to align related titles and then mine word by word associations between the two languages using the extracted alignments. We use the mined word association dictionary for translating queries in Persian-English cross language information retrieval. Our experimental results on Hamshari corpus show that the proposed method is effective in extracting word associations and that Persian Wikipedia is a promising translation resource. Using the association dictionary, we can improve the pure dictionary-based method, where the only translation resource is a bilingual dictionary, by 33.6% and its recall by 26.2%. 0 0
Japanese-chinese information retrieval with an iterative weighting scheme Lin C.-C.
Wang Y.U.-C.
Tsai R.T.-H.
Journal of Information Science and Engineering English 2010 This paper describes our Japanese-Chinese cross language information retrieval system. We adopt query-translation approach and employ both a conventional JapaneseChinese bilingual dictionary and Wikipedia to translate query terms. We propose that Wikipedia can be regarded as a good dictionary for named entity translation. According to the nature of Japanese writing system, we propose that query terms should be processed differently based on their written forms. We use an iterative method for weighttuning and term disambiguation, which is based on the PageRank algorithm. When evaluating on the NTCIR-5 test set, our system achieves as high as 0.2217 and 0.2276 in relax MAP (Mean Average Precision) measurement of T-runs and D-runs. 0 0
Query translation using Wikipedia-based resources for analysis and disambiguation Gaillard B.
Boualem M.
Collin O.
EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation English 2010 This work investigates query translation using only Wikipedia-based resources in a two step approach: analysis and disambiguation. After arguing that data mined from Wikipedia is particularly relevant to query translation, both from a lexical and a semantic perspective, we detail the implementation of the approach. In the analysis phase, lexical units are extracted from queries and associated to several possible translations using a Wikipedia-based bilingual dictionary. During the second phase, one translation is chosen amongst the many candidates, based on topic homogeneity, asserted with the help of semantic information carried by categories of Wikipedia articles. We report promising results regarding translation accuracy. 0 0
Improving the extraction of bilingual terminology from Wikipedia Maike Erdmann
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
ACM Trans. Multimedia Comput. Commun. Appl. English 2009 Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an {SVM} classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs. 2009 {ACM. 0 0
Trdlo, an open source tool for building transducing dictionary Grac M. Lecture Notes in Computer Science English 2009 This paper describes the development of an open-source tool named Trdlo. Trdlo was developed as part of our effort to build a machine translation system between very close languages. These languages usually do not have available pre-processed linguistic resources or dictionaries suitable for computer processing. Bilingual dictionaries have a big impact on quality of translation. Proposed methods described in this paper attempt to extend existing dictionaries with inferable translation pairs. Our approach requires only 'cheap' resources: a list of lemmata for each language and rules for inferring words from one language to another. It is also possible to use other resources like annotated corpora or Wikipedia. Results show that this approach greatly improves effectivity of building Czech-Slovak dictionary. 0 0
Generating patterns for extracting Chinese-Korean named entity translations from the Web Yeh C.-H.
Tsai W.-C.
Wang Y.-C.
Tsai R.T.-H.
Proceedings of the 20th Conference on Computational Linguistics and Speech Processing, ROCLING 2008 English 2008 One of the main difficulties in Chinese-Korean cross-language information retrieval is to translate named entities (NE) in queries. Unlike common words, most NE's are not found in bilingual dictionaries. This paper presents a pattern-based method of finding NE translations online. The most important feature of our system is that patterns are generated and weighed automatically, saving considerable human effort. Our experimental data consists of 160 Chinese-Korean NE pairs selected from Wikipedia in five domains. Our approach can achieve a very high MAP of 0.84, which demonstrates our system's practicability. 0 0
WikiBABEL: Community creation of multilingual data Kumaran A.
Saravanan K.
Maurice S.
WikiSym 2008 - The 4th International Symposium on Wikis, Proceedings English 2008 In this paper, we present a collaborative framework - wikiBABEL - for the efficient and effective creation of multilingual content by a community of users. The wikiBABEL framework leverages the availability of fairly stable content in a source language (typically, English) and a reasonable and not necessarily perfect machine translation system between the source language and a given target language, to create the rough initial content in the target language that is published in a collaborative platform. The platform provides an intuitive user interface and a set of linguistic tools for collaborative correction of the rough content by a community of users, aiding creation of clean content in the target language. We describe the architectural components implementing the wikiBABEL framework, namely, the systems for source and target language content management, mechanisms for coordination and collaboration and intuitive user interface for multilingual editing and review. Importantly, we discuss the integrated linguistic resources and tools, such as, bilingual dictionaries, machine translation and transliteration systems, etc., to help the users during the content correction and creation process. In addition, we analyze and present the prime factors - user-interface features or linguistic tools and resources - that significantly influence the user experiences in multilingual content creation. In addition to the creation of multilingual content, another significant motivation for the wikiBABEL framework is the creation of parallel corpora as a by-product. Parallel linguistic corpora are very valuable resources for both Statistical Machine Translation (SMT) and Crosslingual Information Retrieval (CLIR) research, and may be mined effectively from multilingual data with significant content overlap, as may be created in the wikiBABEL framework. Creation of parallel corpora by professional translators is very expensive, and hence the SMT and CLIR research have been largely confined to a handful of languages. Our attempt to engage the large and diverse Internet user population may aid creation of such linguistic resources economically, and may make computational linguistics research possible and practical in many languages of the world. 0 0