Browse wiki

Jump to: navigation, search
Lexical speaker identification in TV shows
Abstract It is possible to use lexical information It is possible to use lexical information extracted from speech transcripts for speaker identification (SID), either on its own or to improve the performance of standard cepstral-based SID systems upon fusion. This was established before typically using isolated speech from single speakers (NIST SRE corpora, parliamentary speeches). On the contrary, this work applies lexical approaches for SID on a different type of data. It uses the REPERE corpus consisting of unsegmented multiparty conversations, mostly debates, discussions and Q&A sessions from TV shows. It is hypothesized that people give out clues to their identity when speaking in such settings which this work aims to exploit. The impact on SID performance of the diarization front-end required to pre-process the unsegmented data is also measured. Four lexical SID approaches are studied in this work, including TFIDF, BM25 and LDA-based topic modeling. Results are analysed in terms of TV shows and speaker roles. Lexical approaches achieve low error rates for certain speaker roles such as anchors and journalists, sometimes lower than a standard cepstral-based Gaussian Supervector - Support Vector Machine (GSV-SVM) system. Also, in certain cases, the lexical system shows modest improvement over the cepstral-based system performance using score-level sum fusion. To highlight the potential of using lexical information not just to improve upon cepstral-based SID systems but as an independent approach in its own right, initial studies on crossmedia SID is briefly reported. Instead of using speech data as all cepstral systems require, this approach uses Wikipedia texts to train lexical speaker models which are then tested on speech transcripts to identify speakers. © 2014 Springer Science+Business Media New York. Springer Science+Business Media New York.
Abstractsub It is possible to use lexical information It is possible to use lexical information extracted from speech transcripts for speaker identification (SID), either on its own or to improve the performance of standard cepstral-based SID systems upon fusion. This was established before typically using isolated speech from single speakers (NIST SRE corpora, parliamentary speeches). On the contrary, this work applies lexical approaches for SID on a different type of data. It uses the REPERE corpus consisting of unsegmented multiparty conversations, mostly debates, discussions and Q&A sessions from TV shows. It is hypothesized that people give out clues to their identity when speaking in such settings which this work aims to exploit. The impact on SID performance of the diarization front-end required to pre-process the unsegmented data is also measured. Four lexical SID approaches are studied in this work, including TFIDF, BM25 and LDA-based topic modeling. Results are analysed in terms of TV shows and speaker roles. Lexical approaches achieve low error rates for certain speaker roles such as anchors and journalists, sometimes lower than a standard cepstral-based Gaussian Supervector - Support Vector Machine (GSV-SVM) system. Also, in certain cases, the lexical system shows modest improvement over the cepstral-based system performance using score-level sum fusion. To highlight the potential of using lexical information not just to improve upon cepstral-based SID systems but as an independent approach in its own right, initial studies on crossmedia SID is briefly reported. Instead of using speech data as all cepstral systems require, this approach uses Wikipedia texts to train lexical speaker models which are then tested on speech transcripts to identify speakers. © 2014 Springer Science+Business Media New York. Springer Science+Business Media New York.
Bibtextype misc  +
Doi 10.1007/s11042-014-1940-3  +
Has author Roy A. + , Bredin H. + , Hartmann W. + , Le V.B. + , Barras C. + , Gauvain J.-L. +
Has keyword BM25 + , Broadcast conversations + , Classifier fusion + , Crossmedia learning + , Lexical speaker identification + , Speaker roles + , TFIDF + , Wikipedia +
Issn 13807501  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Published in Multimedia Tools and Applications +
Title Lexical speaker identification in TV shows +
Type magazine article  +
Year 2014 +
Creation dateThis property is a special property in this wiki. 4 November 2014 21:42:41  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Magazine articles  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 4 November 2014 21:42:41  +
DateThis property is a special property in this wiki. 2014  +
hide properties that link here 
Lexical speaker identification in TV shows + Title
 

 

Enter the name of the page to start browsing from.