Tibetan-Chinese named entity extraction based on comparable corpus
|Tibetan-Chinese named entity extraction based on comparable corpus|
|Author(s)||Sun Y., Zhao Q.|
|Published in||Applied Mechanics and Materials|
|Keyword(s)||Comparable corpus, Sequence intersection, Tibetan-Chinese named entity, Wikipedia (Extra: Computational linguistics, Data mining, Comparable corpora, Cross language information retrieval, Machine translations, Named entities, Named entity extraction, Sentence alignment, Sequence intersections, Wikipedia, Natural language processing systems)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Tibetan-Chinese named entity extraction is the foundation of Tibetan-Chinese information processing, which provides the basis for machine translation and cross-language information retrieval research. We used the multi-language links of Wikipedia to obtain Tibetan-Chinese comparable corpus, and combined sentence length, word matching and entity boundary words together to carry out sentence alignment. Then we extracted Tibetan-Chinese named entity from the aligned comparable corpus in three ways: (1) Natural labeling information extraction. (2) The links of Tibetan entries and Chinese entries extraction. (3) The method of sequence intersection. It contained taking the sentence as words sequence, recognizing Chinese named entity from Chinese sentences and intersecting aligned Tibetan sentences. Fianlly, through the experiment, the results prove the extraction method based on comparable corpus is effective.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.