Browse wiki

Jump to: navigation, search
A lexicon for processing archaic language: the case of XIXth century Slovene
Abstract The paper presents a lexicon to support coThe paper presents a lexicon to support computational processing of historical Slovene texts. Historical Slovene texts are being increasingly digitised and made available on the internet but are still underutilised as no language technology support is offered for their processing. Appropriate tools and resources would enable full-text searching with modern-day lemmas, modernisation of archaic language to make it more accessible to today‟s readers, and automatic OCR correction. We discuss the lexicon needed to support tokenisation, modernisation, lemmatisation and part-of-speech tagging of historical texts. The process of lexicon acquisition relies on a proof-read corpus, a large lexicon of contemporary Slovene, and tools to map historical forms to their contemporary equivalents via a set of rewrite rules, and to provide an editing environment for lexicon construction. The lexicon, currently work in progress, will be made publicly available; it should help not only in making digital libraries more accessible but also provide a quantitative basis for linguistic explorations of historical Slovene texts and a prototype electronic dictionary of archaic Slovene. electronic dictionary of archaic Slovene.
Abstractsub The paper presents a lexicon to support coThe paper presents a lexicon to support computational processing of historical Slovene texts. Historical Slovene texts are being increasingly digitised and made available on the internet but are still underutilised as no language technology support is offered for their processing. Appropriate tools and resources would enable full-text searching with modern-day lemmas, modernisation of archaic language to make it more accessible to today‟s readers, and automatic OCR correction. We discuss the lexicon needed to support tokenisation, modernisation, lemmatisation and part-of-speech tagging of historical texts. The process of lexicon acquisition relies on a proof-read corpus, a large lexicon of contemporary Slovene, and tools to map historical forms to their contemporary equivalents via a set of rewrite rules, and to provide an editing environment for lexicon construction. The lexicon, currently work in progress, will be made publicly available; it should help not only in making digital libraries more accessible but also provide a quantitative basis for linguistic explorations of historical Slovene texts and a prototype electronic dictionary of archaic Slovene. electronic dictionary of archaic Slovene.
Bibtextype inproceedings  +
Has author Tomaž Erjavec + , Christoph Ringlstetter + , Maja Žorga + , Annette Gotscharek +
Has extra keyword Lexicon + , Historical texts + , Slovene language + , Wikisource +
Has reference Infrastruktura slovenistične literarne vede +
Has remote mirror http://alpage.inria.fr/~sagot/woler2011/WoLeR2011/Program_files/WoLeR%202011%20-%20Erjavec%20Ringlstetter%20Z%CC%8Corga%20Gotscharek.pdf  +
Has webcitation mirror 67RvHlwcf  +
Language English +
Number of citations by publication 0  +
Number of references by publication 1  +
Published in WoLeR 2011: International Workshop on Lexical Resources +
Title A lexicon for processing archaic language: the case of XIXth century Slovene +
Type conference paper  +
Year 2011 +
Creation dateThis property is a special property in this wiki. 5 May 2012 19:11:39  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without DOI parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications  +
Modification dateThis property is a special property in this wiki. 6 May 2012 04:20:57  +
DateThis property is a special property in this wiki. 2011  +
hide properties that link here 
A lexicon for processing archaic language: the case of XIXth century Slovene + Title
 

 

Enter the name of the page to start browsing from.