4
IRUS Total
Downloads
  Altmetric

A study of salient modulation domain features for speaker identification

File Description SizeFormat 
210913_APSIPA2021_Paper_Compressed.pdfPublished version1.09 MBAdobe PDFView/Open
Title: A study of salient modulation domain features for speaker identification
Authors: McKnight, S
Hogg, A
Neo, V
Naylor, P
Item Type: Conference Paper
Abstract: This paper studies the ranges of acoustic andmodulation frequencies of speech most relevant for identifyingspeakers and compares the speaker-specific information presentin the temporal envelope against that present in the temporalfine structure. This study uses correlation and feature importancemeasures, random forest and convolutional neural network mod-els, and reconstructed speech signals with specific acoustic and/ormodulation frequencies removed to identify the salient points. Itis shown that the range of modulation frequencies associated withthe fundamental frequency is more important than the 1-16 Hzrange most commonly used in automatic speech recognition, andthat the 0 Hz modulation frequency band contains significantspeaker information. It is also shown that the temporal envelopeis more discriminative among speakers than the temporal finestructure, but that the temporal fine structure still contains usefuladditional information for speaker identification. This researchaims to provide a timely addition to the literature by identifyingspecific aspects of speech relevant for speaker identification thatcould be used to enhance the discriminant capabilities of machinelearning models.
Issue Date: 3-Feb-2022
Date of Acceptance: 31-Aug-2021
URI: http://hdl.handle.net/10044/1/92134
Publisher: IEEE
Start Page: 705
End Page: 712
Copyright Statement: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Conference Name: Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Keywords: Science & Technology
Technology
Computer Science, Information Systems
Computer Science, Software Engineering
Engineering, Electrical & Electronic
Computer Science
Engineering
TEMPORAL ENVELOPE
SPEECH RECOGNITION
NORMAL-HEARING
FREQUENCY
REPRESENTATION
PERCEPTION
Publication Status: Published
Start Date: 2021-12-14
Finish Date: 2021-12-17
Conference Place: Tokyo, Japan
Online Publication Date: 2022-02-03
Appears in Collections:Electrical and Electronic Engineering
Dyson School of Design Engineering
Faculty of Engineering