Automated non-content word list generation using hLDA
|Automated non-content word list generation using hLDA|
|Author(s)||Krug W., Tomlinson M.T.|
|Published in||FLAIRS 2013 - Proceedings of the 26th International Florida Artificial Intelligence Research Society Conference|
|Keyword(s)||Unknown (Extra: Authorship attribution, Function words, NAtural language processing, On-machines, Unsupervised extraction, Wikipedia, Wikipedia articles, Word lists, Artificial intelligence, Natural language processing systems, Linguistics)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Automated non-content word list generation using hLDA is a 2013 conference paper written in English by Krug W., Tomlinson M.T. and published in FLAIRS 2013 - Proceedings of the 26th International Florida Artificial Intelligence Research Society Conference.
In this paper, we present a language-independent method for the automatic, unsupervised extraction of non-content words from a corpus of documents. This method permits the creation of word lists that may be used in place of traditional function word lists in various natural language processing tasks. As an example we generated lists of words from a corpus of English, Chinese, and Russian posts extracted from Wikipedia articles and Wikipedia Wikitalk discussion pages. We applied these lists to the task of authorship attribution on this corpus to compare the effectiveness of lists of words extracted with this method to expert-created function word lists and frequent word lists (a common alternative to function word lists). hLDA lists perform comparably to frequent word lists. The trials also show that corpus-derived lists tend to perform better than more generic lists, and both sets of generated lists significantly outperformed the expert lists. Additionally, we evaluated the performance of an English expert list on machine translations of our Chinese and Russian documents, showing that our method also outperforms this alternative. Copyright © 2013, Association for the Advancement of Artificial Intelligence. All rights reserved.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.