Multiple hypothesis tracking for overlapping speaker segmentation

File Description SizeFormat 
1570546902 (26).pdfFile embargoed until 01 January 10000470.2 kBAdobe PDF    Request a copy
Title: Multiple hypothesis tracking for overlapping speaker segmentation
Authors: Hogg, A
Evers, C
Naylor, P
Item Type: Conference Paper
Abstract: Speaker segmentation is an essential part of any diarization system.Applications of diarization include tasks such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker environments.This paper proposes a multiple hypothesis tracking (MHT) method that exploits the harmonic structure associated with the pitch in voiced speech in order to segment the onsets and end-points of speech from multiple, overlapping speakers. The proposed method is evaluated against a segmentation system from the literature that uses a spectral representation and is based on employing bidirectional long short term memory networks (BLSTM). The proposed method is shown to achieve comparable performance for segmenting overlapping speakers only using the pitch harmonic information in the MHT framework.
Issue Date: 20-Oct-2019
Date of Acceptance: 15-Jul-2019
Publisher: IEEE
Copyright Statement: This paper is embargoed until publication.
Sponsor/Funder: Engineering & Physical Science Research Council (E
Funder's Grant Number: EP/P001017/1
Conference Name: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Publication Status: Accepted
Start Date: 2019-10-20
Finish Date: 2019-10-23
Conference Place: New York, NY, U.S.A
Embargo Date: publication subject to indefinite embargo
Appears in Collections:Electrical and Electronic Engineering

Items in Spiral are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commons