A lexicon for processing archaic language: the case of XIXth century Slovene
|A lexicon for processing archaic language: the case of XIXth century Slovene|
|Author(s)||Tomaž Erjavec, Christoph Ringlstetter, Maja Žorga, Annette Gotscharek|
|Published in||WoLeR 2011: International Workshop on Lexical Resources|
|Keyword(s)||Unknown (Extra: Lexicon, historical texts, Slovene language, Wikisource)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
A lexicon for processing archaic language: the case of XIXth century Slovene is a 2011 conference paper written in English by Tomaž Erjavec, Christoph Ringlstetter, Maja Žorga, Annette Gotscharek and published in WoLeR 2011: International Workshop on Lexical Resources.
The paper presents a lexicon to support computational processing of historical Slovene texts. Historical Slovene texts are being increasingly digitised and made available on the internet but are still underutilised as no language technology support is offered for their processing. Appropriate tools and resources would enable full-text searching with modern-day lemmas, modernisation of archaic language to make it more accessible to today‟s readers, and automatic OCR correction. We discuss the lexicon needed to support tokenisation, modernisation, lemmatisation and part-of-speech tagging of historical texts. The process of lexicon acquisition relies on a proof-read corpus, a large lexicon of contemporary Slovene, and tools to map historical forms to their contemporary equivalents via a set of rewrite rules, and to provide an editing environment for lexicon construction. The lexicon, currently work in progress, will be made publicly available; it should help not only in making digital libraries more accessible but also provide a quantitative basis for linguistic explorations of historical Slovene texts and a prototype electronic dictionary of archaic Slovene.
This publication has 1 references. Only those references related to wikis are included here:
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.