Language pairs
From WikiPapers
Language pairs (Alternative names for this keyword) | |
Related keyword(s) | Unknown [+] |
Export and share | |
BibTeX, CSV, RDF, JSON | |
![]() ![]() ![]() ![]() ![]() ![]() ![]() | |
Browse properties · List of keywords |
Language pairs is included as keyword or extra keyword in 0 datasets, 0 tools and 8 publications.
Datasets
There is no datasets for this keyword.
Tools
There is no tools for this keyword.
Publications
Title | Author(s) | Published in | Language | DateThis property is a special property in this wiki. | Abstract | R | C |
---|---|---|---|---|---|---|---|
MDL-based models for transliteration generation | Nouri J. Pivovarova L. Yangarber R. |
Lecture Notes in Computer Science | English | 2013 | This paper presents models for automatic transliteration of proper names between languages that use different alphabets. The models are an extension of our work on automatic discovery of patterns of etymological sound change, based on the Minimum Description Length Principle. The models for pairwise alignment are extended with algorithms for prediction that produce transliterated names. We present results on 13 parallel corpora for 7 languages, including English, Russian, and Farsi, extracted from Wikipedia headlines. The transliteration corpora are released for public use. The models achieve up to 88% on word-level accuracy and up to 99% on symbol-level F-score. We discuss the results from several perspectives, and analyze how corpus size, the language pair, the type of names (persons, locations), and noise in the data affect the performance. | 0 | 0 |
Wikipedia as an SMT training corpus | Tufis D. Ion R. Dumitrescu S.D. Stefanescu D. |
International Conference Recent Advances in Natural Language Processing, RANLP | English | 2013 | This article reports on mass experiments supporting the idea that data extracted from strongly comparable corpora may successfully be used to build statistical machine translation systems of reasonable translation quality for in-domain new texts. The experiments were performed for three language pairs: Spanish-English, German-English and Romanian-English, based on large bilingual corpora of similar sentence pairs extracted from the entire dumps of Wikipedia as of June 2012. Our experiments and comparison with similar work show that adding indiscriminately more data to a training corpus is not necessarily a good thing in SMT. | 0 | 0 |
Exploiting a web-based encyclopedia as a knowledge base for the extraction of multilingual terminology | Sadat F. | Lecture Notes in Computer Science | English | 2012 | Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopaedias such as Wikipedia as comparable corpora for bilingual terminology extraction. We propose an approach to extract terms and their translations from different types of Wikipedia link information and data. The next step will be using linguistic-based information to re-rank and filter the extracted term candidates in the target language. Preliminary evaluations using the combined statistics-based and linguistic-based approaches were applied on different pairs of languages including Japanese, French and English. These evaluations showed a real open improvement and a good quality of the extracted term candidates for building or enriching multilingual anthologies, dictionaries or feeding a cross-language information retrieval system with the related expansion terms of the source query. | 0 | 0 |
Extracting the multilingual terminology from a web-based encyclopedia | Fatiha S. | Proceedings - International Conference on Research Challenges in Information Science | English | 2011 | Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology extraction. We propose an approach to extract terms and their translations from different types of Wikipedia link information and data. The next step will be using a linguistic-based information to re-rank and filter the extracted term candidates in the target language. Preliminary evaluations using the combined statistics-based and linguistic-based approaches were applied on different pairs of languages including Japanese, French and English. These evaluations showed a real open improvement and a good quality of the extracted term candidates for building or enriching multilingual ontologies, dictionaries or feeding a cross-language information retrieval system with the related expansion terms of the source query. | 0 | 0 |
Improved transliteration mining using graph reinforcement | El-Kahky A. Kareem Darwish Aldein A.S. El-Wahab M.A. Hefny A. Ammar W. |
EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference | English | 2011 | Mining of transliterations from comparable or parallel text can enhance natural language processing applications such as machine translation and cross language information retrieval. This paper presents an enhanced transliteration mining technique that uses a generative graph reinforcement model to infer mappings between source and target character sequences. An initial set of mappings are learned through automatic alignment of transliteration pairs at character sequence level. Then, these mappings are modeled using a bipartite graph. A graph reinforcement algorithm is then used to enrich the graph by inferring additional mappings. During graph reinforcement, appropriate link reweighting is used to promote good mappings and to demote bad ones. The enhanced transliteration mining technique is tested in the context of mining transliterations from parallel Wikipedia titles in 4 alphabet-based languages pairs, namely English-Arabic, English-Russian, English-Hindi, and English-Tamil. The improvements in F1-measure over the baseline system were 18.7, 1.0, 4.5, and 32.5 basis points for the four language pairs respectively. The results herein outperform the best reported results in the literature by 2.6, 4.8, 0.8, and 4.1 basis points for the four language pairs respectively. | 0 | 0 |
Mining transliterations fromwikipedia using Dynamic Bayesian networks | Peter Nabende | International Conference Recent Advances in Natural Language Processing, RANLP | English | 2011 | Transliteration mining is aimed at building high quality multi-lingual named entity (NE) lexicons for improving performance in various Natural Language Processing (NLP) tasks including Machine Translation (MT) and Cross Language Information Retrieval (CLIR). In this paper, we apply two Dynamic Bayesian network (DBN)-based edit distance (ED) approaches in mining transliteration pairs from Wikipedia. Transliteration identification results on standard corpora for seven language pairs suggest that the DBN-based edit distance approaches are suitable for modeling transliteration similarity. An evaluation on mining transliteration pairs from English-Hindi and English-Tamil Wikipedia topic pairs shows that they improve transliteration mining quality over state-of-the-art approaches. | 0 | 0 |
Exploiting a multilingual web-based encyclopedia for bilingual terminology extraction | Sadat F. | PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation | English | 2010 | Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology extraction. We propose an approach to extract terms and their translations from different types of Wikipedia link information and data. The next step will be using a linguistic-based information to re-rank and filter the extracted term candidates in the target language. Preliminary evaluations using the combined statistics-based and linguistic-based approaches were applied on different pairs of languages including Japanese, French and English. These evaluations showed a real open improvement and a good quality of the extracted term candidates for building or enriching multilingual ontology, dictionaries or feeding a cross-language information retrieval system with the related expansion terms of the source query. | 0 | 0 |
Cross-lingual semantic relatedness using encyclopedic knowledge | Hassan S. Rada Mihalcea |
EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 | English | 2009 | In this paper, we address the task of crosslingual semantic relatedness. We introduce a method that relies on the information extracted from Wikipedia, by exploiting the interlanguage links available between Wikipedia versions in multiple languages. Through experiments performed on several language pairs, we show that the method performs well, with a performance comparable to monolingual measures of relatedness. | 0 | 0 |