Machine translation

From WikiPapers
Jump to: navigation, search

machine translation is included as keyword or extra keyword in 0 datasets, 0 tools and 13 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A seed based method for dictionary translation Krajewski R.
Rybinski H.
Kozlowski M.
Lecture Notes in Computer Science English 2014 The paper refers to the topic of automatic machine translation. The proposed method enables translating a dictionary by means of mining repositories in the source and target repository, without any directly given relationships connecting two languages. It consists of two stages: (1) translation by lexical similarity, where words are compared graphically, and (2) translation by semantic similarity, where contexts are compared. Polish and English version of Wikipedia were used as multilingual corpora. The method and its stages are thoroughly analyzed. The results allow implementing this method in human-in-the-middle systems. 0 0
Supporting multilingual discussion for collaborative translation Noriyuki Ishida
Lin D.
Toshiyuki Takasaki
Toru Ishida
Proceedings of the 2012 International Conference on Collaboration Technologies and Systems, CTS 2012 English 2012 In recent years, collaborative translation has become more and more important for translation volunteers to share knowledge among different languages, among which Wikipedia translation activity is a typical example. During the collaborative translation processes, users with different mother tongues always conduct frequent discussions about certain words or expressions to understand the content of original article and to decide the correct translation. To support such kind of multilingual discussions, we propose an approach to embedding a service-oriented multilingual infrastructure with discussion functions in collaborative translation systems, where discussions can be automatically translated into different languages with machine translators, dictionaries, and so on. Moreover, we propose a Meta Translation Algorithm to adapt the features of discussions for collaborative translation, where discussion articles always consist of expressions in different languages. Further, we implement the proposed approach on LiquidThreads, a BBS on Wikipedia, and apply it for multilingual discussion for Wikipedia translation to verify the effectiveness of this research. 0 0
Analysis on multilingual discussion for Wikipedia translation Linsi Xia
Naomi Yamashita
Toru Ishida
Proceedings - 2011 2nd International Conference on Culture and Computing, Culture and Computing 2011 English 2011 In current Wikipedia translation activities, most translation tasks are performed by bilingual speakers who have high language skills and specialized knowledge of the articles. Unfortunately, compared to the large amount of Wikipedia articles, the number of such qualified translators is very small. Thus the success of Wikipedia translation activities hinges on the contributions from non-bilingual speakers. In this paper, we report on a study investigating the effects of introducing a machine translation mediated BBS that enables monolinguals to collaboratively translate Wikipedia articles using their mother tongues. From our experiment using this system, we found out that users made high use of the system and communicated actively across different languages. Furthermore, most of such multilingual discussions seemed to be successful in transferring knowledge between different languages. Such success appeared to be made possible by a distinctive communication pattern which emerged as the users tried to avoid misunderstandings from machine translation errors. These findings suggest that there is a fair chance of non-bilingual speakers being capable of effectively contributing to Wikipedia translation activities with the assistance of machine translation. 0 0
Extracción de Corpus Paralelos de la Wikipedia basada en la Obtención de Alineamientos Bilingües a Nivel de Frase Joan Albert Silvestre-Cerdà
Mercedes García-Martínez
Alberto Barrón-Cedeño
Jorge Civera
Paolo Rosso
Proceedings of the Workshop on Iberian Cross-Language Natural Language Processing Tasks (ICL 2011) Spanish 2011 This paper presents a proposal for extracting parallel corpora from Wikipedia on the basis of statistical machine translation techniques. We have used word-level alignment models from IBM in order to obtain phrase-level bilingual alignments between documents pairs. We have manually annotated a set of test English-Spanish comparable documents in order to evaluate the model. The obtained results are encouraging. 4 0
No free lunch: Brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity Ture F.
Elsayed T.
Lin J.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2011 This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this problem are of general interest for text mining in the multilingual context and have specific applications in statistical machine translation. Our approach takes advantage of cross-language information retrieval (CLIR) techniques to project feature vectors from one language into another, and then uses locality-sensitive hashing (LSH) to extract similar pairs. We show that effective cross-lingual pairwise similarity requires working with similarity thresholds that are much lower than in typical monolingual applications, making the problem quite challenging. We present a parallel, scalable MapReduce implementation of the sort-based sliding window algorithm, which is compared to a brute-force approach on German and English Wikipedia collections. Our central finding can be summarized as "no free lunch": there is no single optimal solution. Instead, we characterize effectiveness-efficiency tradeoffs in the solution space, which can guide the developer to locate a desirable operating point based on application- and resource-specific constraints. 0 0
Supporting Multilingual Discussion for Wikipedia Translation Noriyuki Ishida
Toshiyuki Takasaki
Masanobu Ishimatsu
Toru Ishida
CULTURE-COMPUTING English 2011 0 0
Supporting multilingual discussion for Wikipedia translation Noriyuki Ishida
Toshiyuki Takasaki
Masanobu Ishimatsu
Toru Ishida
Proceedings - 2011 2nd International Conference on Culture and Computing, Culture and Computing 2011 English 2011 Nowadays Wikipedia has become useful contents on the Web. However, there are great differences among the number of the articles from language to language. Some people try to increase the numbers by the translation, where they should have a discussion (regarding the discussion about the translation itself) because there are some specific words or phrases in an article. They can make use of machine translation in order to participate in the discussion with their own language, which leads to some problems. In this paper, we present the algorithm "Meta Translation", to keep the designated segments untranslated, and to add the description into it. 0 0
Combining wikipedia-based concept models for cross-language retrieval Benjamin Roth
Dietrich Klakow
IRFC English 2010 0 0
Extracting parallel sentences from comparable corpora using document level alignment Smith J.R.
Quirk C.
Toutanova K.
NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference English 2010 The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several approaches developed for obtaining parallel sentences from non-parallel, or comparable data, such as news articles published within the same time period (Munteanu and Marcu, 2005), or web pages with a similar structure (Resnik and Smith, 2003). One resource not yet thoroughly explored is Wikipedia, an online encyclopedia containing linked articles in many languages. We advance the state of the art in parallel sentence extraction by modeling the document level alignment, motivated by the observation that parallel sentence pairs are often found in close proximity. We also include features which make use of the additional annotation given by Wikipedia, and features using an automatically induced lexicon model. Results for both accuracy in sentence extraction and downstream improvement in an SMT system are presented. 0 0
Extracting bilingual word pairs from Wikipedia Tyers
F.
Pienaar
J.
SALTMIL workshop at Language Resources and Evaluation Conference (LREC) 2008 2008 A bilingual dictionary or word list is an important resource for many purposes, among them, machine translation. For many language pairs these are either non-existent, or very often unavailable owing to licensing restrictions. We describe a simple, fast and computationally inexpensive method for extracting bilingual dictionary entries from Wikipedia (using the interwiki link system) and assess the performance of this method with respect to four language pairs. Precision was found to be in the 69-92% region, but open to improvement. 0 1
Method for building sentence-aligned corpus from wikipedia Yasuda K.
Eiichiro Sumita
AAAI Workshop - Technical Report English 2008 We propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wikipedia articles. This novel method can simultaneously generate a statistical machine translation (SMT) and a sentence-aligned corpus. In this study, we perform two types of experiments. The aim of the first type of experiments is to verify the sentence alignment performance by comparing the proposed method with a conventional sentence alignment approach. For the first type of experiments, we use JENAAD, which is a sentence-aligned corpus built by the conventional sentence alignment method. The second type of experiments uses actual English and Japanese Wikipedia articles for sentence alignment. The result of the first type of experiments shows that the performance of the proposed method is comparable to that of the conventional sentence alignment method. Additionally, the second type of experiments shows that wc can obtain the English translation of 10% of Japanese sentences while maintaining high alignment quality (rank-A ratio of over 0.8). Copyright 0 1
Simultaneous multilingual search for translingual information retrieval Parton K.
McKeown K.R.
Allan J.
Henestroza E.
International Conference on Information and Knowledge Management, Proceedings English 2008 We consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a framework for translingual IR that integrates document translation and query translation into the retrieval model. The corpus is represented as an aligned, jointly indexed "pseudo-parallel" corpus, where each document contains the text of the document along with its translation into the query language. The queries are formulated as multilingual structured queries, where each query term and its translations into the document language(s) are treated as synonym sets. This model leverages simultaneous search in multiple languages against jointly indexed documents to improve the accuracy of results over search using document translation or query translation alone. For query translation, we compared a statistical machine translation (SMT) approach to a dictionarybased approach. We found that using a Wikipedia-derived dictionary for named entities combined with an SMT-based dictionary worked better than SMT alone. Simultaneous multilingual search also has other important features suited to translingual search, since it can provide an indication of poor document translation when a match with the source document is found. We show how close integration of CLIR and SMT allows us to improve result translation in addition to IR results. Copyright 2008 ACM. 0 0
WikiBABEL: Community creation of multilingual data Kumaran A.
Saravanan K.
Maurice S.
WikiSym 2008 - The 4th International Symposium on Wikis, Proceedings English 2008 In this paper, we present a collaborative framework - wikiBABEL - for the efficient and effective creation of multilingual content by a community of users. The wikiBABEL framework leverages the availability of fairly stable content in a source language (typically, English) and a reasonable and not necessarily perfect machine translation system between the source language and a given target language, to create the rough initial content in the target language that is published in a collaborative platform. The platform provides an intuitive user interface and a set of linguistic tools for collaborative correction of the rough content by a community of users, aiding creation of clean content in the target language. We describe the architectural components implementing the wikiBABEL framework, namely, the systems for source and target language content management, mechanisms for coordination and collaboration and intuitive user interface for multilingual editing and review. Importantly, we discuss the integrated linguistic resources and tools, such as, bilingual dictionaries, machine translation and transliteration systems, etc., to help the users during the content correction and creation process. In addition, we analyze and present the prime factors - user-interface features or linguistic tools and resources - that significantly influence the user experiences in multilingual content creation. In addition to the creation of multilingual content, another significant motivation for the wikiBABEL framework is the creation of parallel corpora as a by-product. Parallel linguistic corpora are very valuable resources for both Statistical Machine Translation (SMT) and Crosslingual Information Retrieval (CLIR) research, and may be mined effectively from multilingual data with significant content overlap, as may be created in the wikiBABEL framework. Creation of parallel corpora by professional translators is very expensive, and hence the SMT and CLIR research have been largely confined to a handful of languages. Our attempt to engage the large and diverse Internet user population may aid creation of such linguistic resources economically, and may make computational linguistics research possible and practical in many languages of the world. 0 0