Shrinking digital gap through automatic generation of WordNet for Indian languages
|Shrinking digital gap through automatic generation of WordNet for Indian languages|
|Author(s)||Jain A., Tayal D.K., Rai S.|
|Published in||AI & SOCIETY|
|Keyword(s)||Computational lexicon, Indian languages, Statistical methods, Wikipedia, WordNet|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of magazine articles|
Hindi ranks fourth in terms of speaker's size in the world. In spite of that, it has <0.1 % presence on web due to lack of competent lexical resources, a key reason behind digital gap due to language barrier among Indian masses. In the footsteps of the renowned lexical resource English WordNet, 18 Indian languages initiated building WordNets under the project Indo WordNet. India is a multilingual country with around 122 languages and 234 mother tongues. Many Indian languages still do not have any reliable lexical resource, and the coverage of numerous WordNets under progress is still far from average value of 25,792. The tedious manual process and high cost are major reasons behind unsatisfactory coverage and limping progress. In this paper, we discuss the socio-cultural and economic impact of providing Internet accessibility and present an approach for the automatic generation of WordNets to tackle the lack of competent lexical resources. Problems such as accuracy, association of linguistics specific gloss/example and incorrect back-translations which arise while deviating from traditional approach of compilation by lexicographers are resolved by utilising Wikipedia available for Indian languages. © 2014 Springer-Verlag London.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.