4
IRUS TotalDownloads
Altmetric
A study of salient modulation domain features for speaker identification
File | Description | Size | Format | |
---|---|---|---|---|
210913_APSIPA2021_Paper_Compressed.pdf | Published version | 1.09 MB | Adobe PDF | View/Open |
Title: | A study of salient modulation domain features for speaker identification |
Authors: | McKnight, S Hogg, A Neo, V Naylor, P |
Item Type: | Conference Paper |
Abstract: | This paper studies the ranges of acoustic andmodulation frequencies of speech most relevant for identifyingspeakers and compares the speaker-specific information presentin the temporal envelope against that present in the temporalfine structure. This study uses correlation and feature importancemeasures, random forest and convolutional neural network mod-els, and reconstructed speech signals with specific acoustic and/ormodulation frequencies removed to identify the salient points. Itis shown that the range of modulation frequencies associated withthe fundamental frequency is more important than the 1-16 Hzrange most commonly used in automatic speech recognition, andthat the 0 Hz modulation frequency band contains significantspeaker information. It is also shown that the temporal envelopeis more discriminative among speakers than the temporal finestructure, but that the temporal fine structure still contains usefuladditional information for speaker identification. This researchaims to provide a timely addition to the literature by identifyingspecific aspects of speech relevant for speaker identification thatcould be used to enhance the discriminant capabilities of machinelearning models. |
Issue Date: | 3-Feb-2022 |
Date of Acceptance: | 31-Aug-2021 |
URI: | http://hdl.handle.net/10044/1/92134 |
Publisher: | IEEE |
Start Page: | 705 |
End Page: | 712 |
Copyright Statement: | © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Conference Name: | Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) |
Keywords: | Science & Technology Technology Computer Science, Information Systems Computer Science, Software Engineering Engineering, Electrical & Electronic Computer Science Engineering TEMPORAL ENVELOPE SPEECH RECOGNITION NORMAL-HEARING FREQUENCY REPRESENTATION PERCEPTION |
Publication Status: | Published |
Start Date: | 2021-12-14 |
Finish Date: | 2021-12-17 |
Conference Place: | Tokyo, Japan |
Online Publication Date: | 2022-02-03 |
Appears in Collections: | Electrical and Electronic Engineering Dyson School of Design Engineering Faculty of Engineering |