Browse wiki

Jump to: navigation, search
Probabilistic explicit topic modeling using Wikipedia
Abstract Despite popular use of Latent Dirichlet AlDespite popular use of Latent Dirichlet Allocation (LDA) for automatic discovery of latent topics in document corpora, such topics lack connections with relevant knowledge sources such as Wikipedia, and they can be difficult to interpret due to the lack of meaningful topic labels. Furthermore, the topic analysis suffers from a lack of identifiability between topics across independently analyzed corpora but also across distinct runs of the algorithm on the same corpus. This paper introduces two methods for probabilistic explicit topic modeling that address these issues: Latent Dirichlet Allocation with Static Topic-Word Distributions (LDA-STWD), and Explicit Dirichlet Allocation (EDA). Both of these methods estimate topic-word distributions a priori from Wikipedia articles, with each article corresponding to one topic and the article title serving as a topic label. LDA-STWD and EDA overcome the nonidentifiability, isolation, and unintepretability of LDA output. We assess their effectiveness by means of crowd-sourced user studies on two tasks: topic label generation and document label generation. We find that LDA-STWD improves substantially upon the performance of the state-of-the-art on the document labeling task, and that both methods otherwise perform on par with a state-of-the-art post hoc method.r with a state-of-the-art post hoc method.
Abstractsub Despite popular use of Latent Dirichlet AlDespite popular use of Latent Dirichlet Allocation (LDA) for automatic discovery of latent topics in document corpora, such topics lack connections with relevant knowledge sources such as Wikipedia, and they can be difficult to interpret due to the lack of meaningful topic labels. Furthermore, the topic analysis suffers from a lack of identifiability between topics across independently analyzed corpora but also across distinct runs of the algorithm on the same corpus. This paper introduces two methods for probabilistic explicit topic modeling that address these issues: Latent Dirichlet Allocation with Static Topic-Word Distributions (LDA-STWD), and Explicit Dirichlet Allocation (EDA). Both of these methods estimate topic-word distributions a priori from Wikipedia articles, with each article corresponding to one topic and the article title serving as a topic label. LDA-STWD and EDA overcome the nonidentifiability, isolation, and unintepretability of LDA output. We assess their effectiveness by means of crowd-sourced user studies on two tasks: topic label generation and document label generation. We find that LDA-STWD improves substantially upon the performance of the state-of-the-art on the document labeling task, and that both methods otherwise perform on par with a state-of-the-art post hoc method.r with a state-of-the-art post hoc method.
Bibtextype inproceedings  +
Doi 10.1007/978-3-642-40722-2_7  +
Has author Hansen J.A. + , Ringger E.K. + , Seppi K.D. +
Has extra keyword Automatic discovery + , Identifiability + , Knowledge sources + , Latent Dirichlet allocation + , Latent dirichlet allocations + , Topic analysis + , Topic Modeling + , Wikipedia articles + , Computational linguistics + , Statistics + , Probability distributions +
Isbn 9783642407215  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 69–82  +
Published in Lecture Notes in Computer Science +
Title Probabilistic explicit topic modeling using Wikipedia +
Type conference paper  +
Volume 8105 LNAI  +
Year 2013 +
Creation dateThis property is a special property in this wiki. 7 November 2014 13:32:17  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 13:32:17  +
DateThis property is a special property in this wiki. 2013  +
hide properties that link here 
Probabilistic explicit topic modeling using Wikipedia + Title
 

 

Enter the name of the page to start browsing from.