Browse wiki

Jump to: navigation, search
Chinese characters conversion system based on lookup table and language model
Abstract The character sets used in China and TaiwaThe character sets used in China and Taiwan are both Chinese, but they are divided into simplified and traditional Chinese characters. There are large amount of information exchange between China and Taiwan through books and Internet. To provide readers a convenient reading environment, the character conversion between simplified and traditional Chinese is necessary. The conversion between simplified and traditional Chinese characters has two problems: one-to-many ambiguity and term usage problems. Since there are many traditional Chinese characters that have only one corresponding simplified character, when converting simplified Chinese into traditional Chinese, the system will face the one-to-many ambiguity. Also, there are many terms that have different usages between the two Chinese societies. This paper focus on designing an extensible conversion system, that can take the advantage of community knowledge by accumulating lookup tables through Wikipedia to tackle the term usage problem and can integrate language model to disambiguate the one-to-many ambiguity. The system can reduce the cost of proofreading of character conversion for books, e-books, or online publications. The extensible architecture makes it easy to improve the system with new training data.improve the system with new training data.
Abstractsub The character sets used in China and TaiwaThe character sets used in China and Taiwan are both Chinese, but they are divided into simplified and traditional Chinese characters. There are large amount of information exchange between China and Taiwan through books and Internet. To provide readers a convenient reading environment, the character conversion between simplified and traditional Chinese is necessary. The conversion between simplified and traditional Chinese characters has two problems: one-to-many ambiguity and term usage problems. Since there are many traditional Chinese characters that have only one corresponding simplified character, when converting simplified Chinese into traditional Chinese, the system will face the one-to-many ambiguity. Also, there are many terms that have different usages between the two Chinese societies. This paper focus on designing an extensible conversion system, that can take the advantage of community knowledge by accumulating lookup tables through Wikipedia to tackle the term usage problem and can integrate language model to disambiguate the one-to-many ambiguity. The system can reduce the cost of proofreading of character conversion for books, e-books, or online publications. The extensible architecture makes it easy to improve the system with new training data.improve the system with new training data.
Bibtextype inproceedings  +
Has author Li M.-H. + , Wu S.-H. + , Yang P.-C. + , Ku T. +
Has extra keyword Chinese character conversions + , Chinese characters + , Conversion systems + , Language model + , Large amounts + , Online publications + , Training data + , Wikipedia + , Character sets + , Computational linguistics + , Speech processing + , Table lookup + , Electronic document exchange +
Has keyword Chinese character conversion + , Language model + , Lookup table + , Wikipedia +
Has reference Zh-cn +
Language Chinese +
Number of citations by publication 0  +
Number of references by publication 1  +
Pages 113–127  +
Published in Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing, ROCLING 2010 +
Title Chinese characters conversion system based on lookup table and language model +
Type conference paper  +
Year 2010 +
Creation dateThis property is a special property in this wiki. 7 November 2014 05:55:28  +
Categories Publications without license parameter  + , Publications without DOI parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 05:55:28  +
DateThis property is a special property in this wiki. 2010  +
hide properties that link here 
Chinese characters conversion system based on lookup table and language model + Title
 

 

Enter the name of the page to start browsing from.