The translation mining of the out of Vocabulary based on Wikipedia
The translation mining of the out of Vocabulary based on Wikipedia is a 2011 journal article written in Chinese by Sun C., Hong Y., Ge Y., Yao J., Zhu Q. and published in Jisuanji Yanjiu yu Fazhan/Computer Research and Development.
The query translation is one of the key factors that affect the performance of cross-language information retrieval (CLIR). In the process of querying, the excavation of the out of vocabulary (OOV) has the important significance to improve CLIRT. Out of Vocabulary means the words or phrase which can't be found in the dictionary. In this paper, according to Wikipedia data structure and language features, we divide translation environment into target-existence environment and target-deficit environment. Depending on the difficulty of translation mining in the target-deficit environment, we adopt the frequency change information and adjacency information to realize the extraction of candidate units, and compare common extraction methods of units. The results verify that our methods are more effective. We establish the strategy of mixed translation mining based on the frequency-distance model, surface pattern matching model and summary-score model, and add the model one by one, and then verify the function influence of each model. The experiments use the mining technique of OOV in search engine as baseline and then evaluate the results with TOP1. The results verify that the mixed translation mining method based on Wikipedia can achieve the correct translation rate of 0.6822, and the improvements on this method are 6.98% over the baseline.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.