Altmetric

Audiovisual speech comprehension of degraded and synthetic visual signals

File Description SizeFormat 
Varano-E-2023-PhD-Thesis.pdfThesis11.75 MBAdobe PDFView/Open
Title: Audiovisual speech comprehension of degraded and synthetic visual signals
Authors: Varano, Enrico
Item Type: Thesis or dissertation
Abstract: Seeing a speaker’s face helps comprehension, in particular in challenging listening conditions or for those living with hearing loss – an effect thought to arise from the integration of temporal and categorical features carried by the visual stream. Recent studies into the neurobiological mechanisms of speech perception have employed continuous stimuli, an important milestone towards understanding of such processes in ecological paradigms. However, efforts to extend this principle further to audiovisual speech are impeded by a lack of high-quality recordings. We seek to close this gap by presenting the AVbook corpus, which includes methods designed to enable synchronised delivery of the streams, is presented alongside validation data, and is publicly available to support research in neurobiology and speech recognition. We then employ this corpus to investigate how the cortical tracking of the speech envelope is affected by degraded visual signals. We find that visual signals need to contain information beyond the speech envelope to convey a benefit and, employing electroencephalography, that this benefit is linked to the gain in the delta-band activity, evidencing a role of the cortical tracking of words in audiovisual speech comprehension. Recent advances in speech-driven models have made it possible to synthesise photo-realistic talkers from a still portrait. We demonstrate the suitability of such signals in improving speech- in-noise comprehension by showing that humans cannot distinguish between the natural and the synthesised videos, that the latter aid humans in understanding speech and that audiovisual speech recognisers benefit from the these animations too. The work discussed in this thesis sheds light on some of the poorly understood neural and be- havioural processes underlying audiovisual speech perception, and characterises the effectiveness of synthesised videos as listening aids. The AVbook corpus significantly reduces the activation energy for further works on the matter and a range of experimental paradigms beyond those considered here.
Content Version: Open Access
Issue Date: Jan-2023
Date Awarded: Oct-2023
URI: http://hdl.handle.net/10044/1/115443
DOI: https://doi.org/10.25560/115443
Copyright Statement: Creative Commons Attribution NonCommercial Licence
Supervisor: Reichenbach, Johann
Sponsor/Funder: Royal British Legion
Engineering and Physical Sciences Research Council
Funder's Grant Number: EP/R032602/1
Department: Bioengineering
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Bioengineering PhD theses



This item is licensed under a Creative Commons License Creative Commons