Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures

File Description SizeFormat 
1-s2.0-S0885230816300778-main.pdfPublished version1.22 MBAdobe PDFDownload
Title: Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures
Author(s): Moore, AH
Peso, P
Naylor, PA
Item Type: Journal Article
Abstract: Automatic speech recognition in everyday environments must be robust to significant levels of reverberation and noise. One strategy to achieve such robustness is multi-microphone speech enhancement. In this study, we present results of an evaluation of different speech enhancement pipelines using a state-of-the-art ASR system for a wide range of reverberation and noise conditions. The evaluation exploits the recently released ACE Challenge database which includes measured multichannel acoustic impulse responses from 7 different rooms with reverberation times ranging from 0.33 s to 1.34 s. The reverberant speech is mixed with ambient, fan and babble noise recordings made with the same microphone setups in each of the rooms. In the first experiment performance of the ASR without speech processing is evaluated. Results clearly indicate the deleterious effect of both noise and reverberation. In the second experiment, different speech enhancement pipelines are evaluated with relative word error rate reductions of up to 82%. Finally, the ability of selected instrumental metrics to predict ASR performance improvement is assessed. The best performing metric, Short-Time Objective Intelligibility Measure, is shown to have a Pearson correlation coefficient of 0.79, suggesting that it is a useful predictor of algorithm performance in these tests.
Publication Date: 8-Dec-2016
Date of Acceptance: 25-Nov-2016
URI: http://hdl.handle.net/10044/1/43057
DOI: https://dx.doi.org/10.1016/j.csl.2016.11.003
ISSN: 1095-8363
Publisher: Elsevier
Start Page: 574
End Page: 584
Journal / Book Title: Computer Speech and Language
Volume: 46
Copyright Statement: © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. (http://creativecommons.org/ licenses/by/4.0/)
Sponsor/Funder: Commission of the European Communities
Commission of the European Communities
Engineering & Physical Science Research Council (EPSRC)
Funder's Grant Number: PITN-GA-2012-316969
609465
ep/m026698/1
Keywords: Speech-Language Pathology & Audiology
0801 Artificial Intelligence And Image Processing
1702 Cognitive Science
Publication Status: Published
Appears in Collections:Faculty of Engineering
Electrical and Electronic Engineering



Items in Spiral are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commons