Measuring comparability of multilingual corpora extracted from wikipedia
|Measuring comparability of multilingual corpora extracted from wikipedia|
|Author(s)||Otero P.G., Lopez I.G.|
|Published in||CEUR Workshop Proceedings|
|Keyword(s)||Bilingual lexicons, Comparability, Comparable corpora, Information extraction (Extra: Bilingual lexicon extractions, Bilingual lexicons, Comparability, Comparable corpora, Wikipedia, Information retrieval, Natural language processing systems, Linguistics)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Comparable corpora can be used for many linguistic tasks such as bilingual lexicon extraction. By improving the quality of comparable corpora, we improve the quality of the extraction. This article describes some strategies to build comparable corpora from Wikipedia and proposes a measure of comparability. Experiments were performed on Portuguese, Spanish, and English Wikipedia.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.