Open-domain topic identification of out-of-domain utterances using
Wikipedia
Wikipedia
File(s)2101.11134v1.pdf (450.37 KB)
Working paper
Author(s)
Type
Working Paper
Abstract
Users of spoken dialogue systems (SDS) expect high quality interactions
across a wide range of diverse topics. However, the implementation of SDS
capable of responding to every conceivable user utterance in an informative way
is a challenging problem. Multi-domain SDS must necessarily identify and deal
with out-of-domain (OOD) utterances to generate appropriate responses as users
do not always know in advance what domains the SDS can handle. To address this
problem, we extend the current state-of-the-art in multi-domain SDS by
estimating the topic of OOD utterances using external knowledge representation
from Wikipedia. Experimental results on real human-to-human dialogues showed
that our approach does not degrade domain prediction performance when compared
to the base model. But more significantly, our joint training achieves more
accurate predictions of the nearest Wikipedia article by up to about 30% when
compared to the benchmarks.
across a wide range of diverse topics. However, the implementation of SDS
capable of responding to every conceivable user utterance in an informative way
is a challenging problem. Multi-domain SDS must necessarily identify and deal
with out-of-domain (OOD) utterances to generate appropriate responses as users
do not always know in advance what domains the SDS can handle. To address this
problem, we extend the current state-of-the-art in multi-domain SDS by
estimating the topic of OOD utterances using external knowledge representation
from Wikipedia. Experimental results on real human-to-human dialogues showed
that our approach does not degrade domain prediction performance when compared
to the base model. But more significantly, our joint training achieves more
accurate predictions of the nearest Wikipedia article by up to about 30% when
compared to the benchmarks.
Date Issued
2021-01-26
Citation
2021
Publisher
arXiv
Copyright Statement
© 2021 The Author(s). This item is licensed with CC BY-NC-ND https://creativecommons.org/licenses/by-nc-nd/4.0/
Identifier
http://arxiv.org/abs/2101.11134v1
Subjects
cs.CL
cs.CL
cs.LG