Annette Gotscharek

From WikiPapers
Jump to: navigation, search

Annette Gotscharek is an author.


Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
A lexicon for processing archaic language: the case of XIXth century Slovene WoLeR 2011: International Workshop on Lexical Resources English 2011 The paper presents a lexicon to support computational processing of historical Slovene texts. Historical Slovene texts are being increasingly digitised and made available on the internet but are still underutilised as no language technology support is offered for their processing. Appropriate tools and resources would enable full-text searching with modern-day lemmas, modernisation of archaic language to make it more accessible to today‟s readers, and automatic OCR correction. We discuss the lexicon needed to support tokenisation, modernisation, lemmatisation and part-of-speech tagging of historical texts. The process of lexicon acquisition relies on a proof-read corpus, a large lexicon of contemporary Slovene, and tools to map historical forms to their contemporary equivalents via a set of rewrite rules, and to provide an editing environment for lexicon construction. The lexicon, currently work in progress, will be made publicly available; it should help not only in making digital libraries more accessible but also provide a quantitative basis for linguistic explorations of historical Slovene texts and a prototype electronic dictionary of archaic Slovene. 1 0