Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • Communities & Collections
  • Research Outputs
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Medicine
  3. Department of Surgery and Cancer
  4. Department of Surgery and Cancer
  5. Computationally Efficient and Robust BIC-Based Speaker Segmentation
 
  • Details
Computationally Efficient and Robust BIC-Based Speaker Segmentation
File(s)
IEEE_TRANS_ASLP_2008_Margarita_Kotti.pdf (334.93 KB)
Accepted version
Author(s)
Kotti, Margarita
Benetos, Emmanouil
Kotropoulos, Constantine
Type
Journal Article
Abstract
An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmenta-tion. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches. © 2008 IEEE.
Date Issued
2008-07
Citation
IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16 (5), pp.920-933
URI
http://hdl.handle.net/10044/1/11710
DOI
https://www.dx.doi.org/10.1109/TASL.2008.925152
ISSN
1558-7916
Publisher
IEEE
Start Page
920
End Page
933
Journal / Book Title
IEEE Transactions on Audio, Speech, and Language Processing
Volume
16
Issue
5
Copyright Statement
© 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
License URL
http://www.rioxx.net/licenses/all-rights-reserved
Description
06.08.13 KB. Ok to add accepted version to Spiral. IEEE
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback