VIisual-only discrimination between native and non-native speech
File(s)georgakisetal_visualonlynativevsnonnative.pdf (815.83 KB)
Accepted version
Author(s)
Georgakis, C
Petridis, S
Pantic, M
Type
Conference Paper
Abstract
Accent is an important biometric characteristic that is defined by the presence of specific traits in the speaking style of an individual. These are identified by patterns in the speech production system, such as those present in the vocal tract or in lip movements. Evidence from linguistics and speech processing research suggests that visual information enhances speech recognition. Intrigued by these findings, along with the assumption that visually perceivable accent-related patterns are transferred from the mother tongue to a foreign language, we investigate the task of discriminating native from non-native speech in English, employing visual features only. Training and evaluation is performed on segments of continuous visual speech, captured by mobile phones, where all speakers read the same text. We apply various appearance descriptors to represent the mouth region at each video frame. Vocabulary-based histograms, being the final representation of dynamic features for all utterances, are used for recognition. Binary classification experiments, discriminating native and non-native speakers, are conducted in a subject-independent manner. Our results show that this task can be addressed by means of an automated approach that uses visual features only.
Date Issued
2014-05-09
Date Acceptance
2014-05-04
Citation
2014 IEEE IInternational Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp.4828-4832
ISSN
1520-6149
Publisher
IEEE
Start Page
4828
End Page
4832
Journal / Book Title
2014 IEEE IInternational Conference on Acoustics, Speech and Signal Processing (ICASSP)
Copyright Statement
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Identifier
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000343655304171&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
Source
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects
Science & Technology
Technology
Acoustics
Engineering, Electrical & Electronic
Engineering
Non-native speech identification
Accent classification
Visual speech processing
Language identification
Audiovisual speech
Recognition
Publication Status
Published
Start Date
2014-05-04
Finish Date
2014-05-09
Coverage Spatial
Florence, ITALY