Prediction of user emotion and dialogue success using audio spectrograms and convolutional neural networks
File(s)SIGDIAL2019_Margarita.pdf (304.99 KB)
Accepted version
Author(s)
Lykartsis, Athanasios
Kotti, Margarita
Type
Conference Paper
Abstract
In this paper we aim to predict dialogue suc-cess and user satisfaction as well as emo-tion on a turn level. To achieve this, we in-vestigate the use of spectrogram representa-tions, extracted from audio files, in combina-tion with several types of convolutional neuralnetworks. The experiments were performed onthe Let’s Go V2 database, comprising 5065 au-dio files and having labels for subjective andobjective dialogue turn success, as well as theemotional state of the user. Results show thatby using only audio, it is possible to predictturn success with very high accuracy for allthree labels (90%). The best performing inputrepresentation were 1s long mel-spectrogramsin combination with a CNN with a bottleneckarchitecture. The resulting system has the po-tential to be used real-time. Our results signif-icantly surpass the state of the art for dialoguesuccess prediction based only on audio.
Date Issued
2019-09-11
Date Acceptance
2019-07-05
Citation
20th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2019
Publisher
Association for Computational Linguistics (ACL)
Journal / Book Title
20th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Copyright Statement
© 2019 The Association for Computational Linguistics
Source
SIGDIAL 2019
Publication Status
Published
Start Date
2019-09-11
Finish Date
2019-09-13
Coverage Spatial
Stockhom, Sweden