Automatic extraction of Polish language errors from text edition history

From WikiPapers
Jump to: navigation, search

Automatic extraction of Polish language errors from text edition history is a 2013 conference paper written in English by Grundkiewicz R. and published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).

[edit] Abstract

There are no large error corpora for a number of languages, despite the fact that they have multiple applications in natural language processing. The main reason underlying this situation is a high cost of manual corpora creation. In this paper we present the methods of automatic extraction of various kinds of errors such as spelling, typographical, grammatical, syntactic, semantic, and stylistic ones from text edition histories. By applying of these methods to the Wikipedia's article revision history, we created the large and publicly available corpus of naturally-occurring language errors for Polish, called PlEWi. Finally, we analyse and evaluate the detected error categories in our corpus.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.