Multichannel overlapping speaker segmentation using multiple hypothesis tracking of acoustic and spatial features
File(s)IEEE_ICASSP_Paper_2021_10_11_ah.pdf (383.56 KB)
Accepted version
Author(s)
Hogg, Aidan
Naylor, Patrick
Evers, Christine
Type
Conference Paper
Abstract
An essential part of any diarization system is the task of speaker segmentation which is important for many applications including speaker indexing and automatic speech recognition (ASR) in multi-speaker environments. Segmentation of overlapping speech has recently been a key focus of this work. In this paper we explore the use of a new multimodal approach for overlapping speaker segmentation that tracks both the fundamental frequency (F0) of the speaker and the speaker’s direction of arrival (DOA) simultaneously. Our proposed multiple hypothesis tracking system, which simultaneously tracks both features, shows an improvement in segmentation performance when compared to tracking these features separately. An illustrative example of overlapping speech demonstrates the effectiveness of our proposed system. We also undertake a statistical analysis on 12 meetings from the AMI corpus and show an improvement in the HIT rate of 14.1% on average against a commonly used deep learning bidirectional long short term memory network (BLSTM) approach.
Date Issued
2021-05-13
Date Acceptance
2021-01-30
Citation
2021, pp.26-30
Publisher
IEEE
Start Page
26
End Page
30
Copyright Statement
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor
Engineering and Physical Sciences Research Council
Identifier
https://ieeexplore.ieee.org/document/9414130
Grant Number
EP/L016796/1
Source
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects
Science & Technology
Technology
Acoustics
Computer Science, Artificial Intelligence
Computer Science, Software Engineering
Engineering, Electrical & Electronic
Imaging Science & Photographic Technology
Computer Science
Engineering
speaker segmentation
Kalman filter
multiple hypothesis tracking
fundamental frequency
direction of arrival
DIARIZATION
Publication Status
Published
Start Date
2021-06-06
Finish Date
2021-06-11
Coverage Spatial
Toronto, Ontario, Canada
Date Publish Online
2021-05-13