The Tanl lemmatizer enriched with a sequence of cascading filters
|The Tanl lemmatizer enriched with a sequence of cascading filters|
|Author(s)||Attardi G., Dei Rossi S., Simi M.|
|Published in||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Keyword(s)||Deep Search, Lemmatization, Lexicon, Part-of-Speech tagging (Extra: Deep Search, External resources, Lemmatization, Lemmatizer, Lexicon, Part of speech tagging, Semantic ambiguities, Wikipedia, Semantics, Tools, Natural language processing systems)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
The Tanl lemmatizer enriched with a sequence of cascading filters is a 2013 conference paper written in English by Attardi G., Dei Rossi S., Simi M. and published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
We have extended an existing lemmatizer, which relies on a lexicon of about 1.2 millions form, where lemmas are indexed by rich PoS tags, with a sequence of cascading filters, each one in charge of dealing with specific issues related to out-of-dictionary words. The last two filters are devoted to resolve semantic ambiguities between words of the same syntactic category, by querying external resources: an enriched index built on the Italian Wikipedia and the Google index.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.