Chinese characters conversion system based on lookup table and language model

From WikiPapers
Jump to: navigation, search

Chinese characters conversion system based on lookup table and language model is a 2010 conference paper written in Chinese by Li M.-H., Wu S.-H., Yang P.-C., Ku T. and published in Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing, ROCLING 2010.

[edit] Abstract

The character sets used in China and Taiwan are both Chinese, but they are divided into simplified and traditional Chinese characters. There are large amount of information exchange between China and Taiwan through books and Internet. To provide readers a convenient reading environment, the character conversion between simplified and traditional Chinese is necessary. The conversion between simplified and traditional Chinese characters has two problems: one-to-many ambiguity and term usage problems. Since there are many traditional Chinese characters that have only one corresponding simplified character, when converting simplified Chinese into traditional Chinese, the system will face the one-to-many ambiguity. Also, there are many terms that have different usages between the two Chinese societies. This paper focus on designing an extensible conversion system, that can take the advantage of community knowledge by accumulating lookup tables through Wikipedia to tackle the term usage problem and can integrate language model to disambiguate the one-to-many ambiguity. The system can reduce the cost of proofreading of character conversion for books, e-books, or online publications. The extensible architecture makes it easy to improve the system with new training data.

[edit] References

This publication has 1 references. Only those references related to wikis are included here:

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 1 time(s)