Enhanced robot speech recognition using biomimetic binaural sound source localisation
File(s)08371531.pdf (1.8 MB)
Published version
Author(s)
Davila-Chacon, Jorge
Liu, Jindong
Wermter, Stefan
Type
Journal Article
Abstract
Inspired by the behaviour of humans talking
in noisy environments, we propose an embodied embedded
cognition approach to improve automatic speech recogni-
tion (ASR) systems for robots in complex situation, such
as with ego-noise, using binaural sound source localisation
(SSL). The approach is verified by measuring the impact
of SSL with a humanoid robot head on the performance
of an ASR system. More specifically, a robot orients itself
towards the angle where the signal-to-noise ratio (SNR)
of speech is maximised for one microphone before doing
an ASR task. First, a spiking neural network inspired by
the midbrain auditory system based on our previous work
is applied to calculate the sound signal angle. Then, a
feedforward neural network is used to handle high levels
of ego-noise and reverberation in the signal. Finally, the
sound signal is fed into an ASR system. For ASR, we
use a system developed by our group and compare its
performance with and without support from SSL. We test
our SSL and ASR systems on two humanoid platforms
with different structural and material properties. With our
approach we halve the sentence error rate with respect to
the common downmixing of both channels. Surprisingly,
the ASR performance is more than two times better when
the angle between the humanoid head and the sound source
allows sound waves to be reflected most intensely from the
pinna to the ear microphone, rather than when sound
waves arrive perpendicularly to the membrane.
in noisy environments, we propose an embodied embedded
cognition approach to improve automatic speech recogni-
tion (ASR) systems for robots in complex situation, such
as with ego-noise, using binaural sound source localisation
(SSL). The approach is verified by measuring the impact
of SSL with a humanoid robot head on the performance
of an ASR system. More specifically, a robot orients itself
towards the angle where the signal-to-noise ratio (SNR)
of speech is maximised for one microphone before doing
an ASR task. First, a spiking neural network inspired by
the midbrain auditory system based on our previous work
is applied to calculate the sound signal angle. Then, a
feedforward neural network is used to handle high levels
of ego-noise and reverberation in the signal. Finally, the
sound signal is fed into an ASR system. For ASR, we
use a system developed by our group and compare its
performance with and without support from SSL. We test
our SSL and ASR systems on two humanoid platforms
with different structural and material properties. With our
approach we halve the sentence error rate with respect to
the common downmixing of both channels. Surprisingly,
the ASR performance is more than two times better when
the angle between the humanoid head and the sound source
allows sound waves to be reflected most intensely from the
pinna to the ear microphone, rather than when sound
waves arrive perpendicularly to the membrane.
Date Issued
2018-06-04
Date Acceptance
2018-04-22
Citation
IEEE Transactions on Neural Networks and Learning Systems, 2018, 30 (1), pp.138-150
ISSN
2162-2388
Publisher
Institute of Electrical and Electronics Engineers
Start Page
138
End Page
150
Journal / Book Title
IEEE Transactions on Neural Networks and Learning Systems
Volume
30
Issue
1
Copyright Statement
© 2018 The Author(s). This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/
License URL
Identifier
https://ieeexplore.ieee.org/abstract/document/8371531
Subjects
Artificial Intelligence & Image Processing
Publication Status
Published
Date Publish Online
2020-06-04