Speech-driven facial animations improve speech-in-noise comprehension of humans
File(s)fnins-15-781196 (1).pdf (2.5 MB)
Published version
Publication available at
Author(s)
Type
Journal Article
Abstract
Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments.
Online Publication Date
2022-01-25T15:36:29Z
Date Acceptance
2021-11-29
ISSN
1662-453X
Publisher
Frontiers Media
Journal / Book Title
Frontiers in Neuroscience
Copyright Statement
© 2022 Varano, Vougioukas, Ma, Petridis, Pantic and Reichenbach. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
License URI
Identifier
https://www.frontiersin.org/articles/10.3389/fnins.2021.781196/full
Subjects
1109 Neurosciences
1701 Psychology
1702 Cognitive Sciences
Publication Status
Published
Date Publish Online
2022-01-05