Altmetric
Audiovisual speech comprehension of degraded and synthetic visual signals
File | Description | Size | Format | |
---|---|---|---|---|
Varano-E-2023-PhD-Thesis.pdf | Thesis | 11.75 MB | Adobe PDF | View/Open |
Title: | Audiovisual speech comprehension of degraded and synthetic visual signals |
Authors: | Varano, Enrico |
Item Type: | Thesis or dissertation |
Abstract: | Seeing a speaker’s face helps comprehension, in particular in challenging listening conditions or for those living with hearing loss – an effect thought to arise from the integration of temporal and categorical features carried by the visual stream. Recent studies into the neurobiological mechanisms of speech perception have employed continuous stimuli, an important milestone towards understanding of such processes in ecological paradigms. However, efforts to extend this principle further to audiovisual speech are impeded by a lack of high-quality recordings. We seek to close this gap by presenting the AVbook corpus, which includes methods designed to enable synchronised delivery of the streams, is presented alongside validation data, and is publicly available to support research in neurobiology and speech recognition. We then employ this corpus to investigate how the cortical tracking of the speech envelope is affected by degraded visual signals. We find that visual signals need to contain information beyond the speech envelope to convey a benefit and, employing electroencephalography, that this benefit is linked to the gain in the delta-band activity, evidencing a role of the cortical tracking of words in audiovisual speech comprehension. Recent advances in speech-driven models have made it possible to synthesise photo-realistic talkers from a still portrait. We demonstrate the suitability of such signals in improving speech- in-noise comprehension by showing that humans cannot distinguish between the natural and the synthesised videos, that the latter aid humans in understanding speech and that audiovisual speech recognisers benefit from the these animations too. The work discussed in this thesis sheds light on some of the poorly understood neural and be- havioural processes underlying audiovisual speech perception, and characterises the effectiveness of synthesised videos as listening aids. The AVbook corpus significantly reduces the activation energy for further works on the matter and a range of experimental paradigms beyond those considered here. |
Content Version: | Open Access |
Issue Date: | Jan-2023 |
Date Awarded: | Oct-2023 |
URI: | http://hdl.handle.net/10044/1/115443 |
DOI: | https://doi.org/10.25560/115443 |
Copyright Statement: | Creative Commons Attribution NonCommercial Licence |
Supervisor: | Reichenbach, Johann |
Sponsor/Funder: | Royal British Legion Engineering and Physical Sciences Research Council |
Funder's Grant Number: | EP/R032602/1 |
Department: | Bioengineering |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Bioengineering PhD theses |
This item is licensed under a Creative Commons License