Method for building sentence-aligned corpus from wikipedia

From WikiPapers
Jump to: navigation, search

Method for building sentence-aligned corpus from wikipedia is a 2008 conference paper written in English by Yasuda K., Sumita E. and published in AAAI Workshop - Technical Report.

[edit] Abstract

We propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wikipedia articles. This novel method can simultaneously generate a statistical machine translation (SMT) and a sentence-aligned corpus. In this study, we perform two types of experiments. The aim of the first type of experiments is to verify the sentence alignment performance by comparing the proposed method with a conventional sentence alignment approach. For the first type of experiments, we use JENAAD, which is a sentence-aligned corpus built by the conventional sentence alignment method. The second type of experiments uses actual English and Japanese Wikipedia articles for sentence alignment. The result of the first type of experiments shows that the performance of the proposed method is comparable to that of the conventional sentence alignment method. Additionally, the second type of experiments shows that wc can obtain the English translation of 10% of Japanese sentences while maintaining high alignment quality (rank-A ratio of over 0.8). Copyright

[edit] References

This section requires expansion. Please, help!

Cited by

This publication has 1 citations. Only those publications available in WikiPapers are shown here: