Realising affect-sensitive multimodal human-computer interface: hardware and software infrastructure

File Description SizeFormat 
Shen-J-2014-PhD-Thesis.pdfThesis2.71 MBAdobe PDFView/Open
Title: Realising affect-sensitive multimodal human-computer interface: hardware and software infrastructure
Authors: Shen, Jie
Item Type: Thesis or dissertation
Abstract: With the industry's recent paradigm shift from PC-centred applications to services delivered through ubiquitous computing in a more human centred manner, multimodal human-computer interfaces (MHCI) became an emerging research topic. As an important but often neglected aspect, the lack of appropriate system integration tools hinders the development of MHCI systems. Therefore, the work presented in this thesis aims at delivering hardware / software infrastructure to facilitate the full development cycle of MHCI systems. Specifically, we first built a hardware platform for synchronised, multimodal-data capturing to support and facilitate automatic human behaviour understanding from multiple audiovisual sensors. Then we developed a software framework, called the HCI^2 Framework, to facilitate the modular development and rapid prototyping of readily-applicable MHCI systems. As a proof of concept, we also present an affect-sensitive game with humanoid robot NAO developed using the HCI^2 Framework. Studies on automatic human behaviour understanding require high-bandwidth recording from multiple cameras, as well as from other sensors such as microphones and eye-gaze trackers. In addition, sensor fusion should be realised with high accuracy as to achieve tight synchronisation between sensors and, in turn, enable studies of correlation between various behavioural signals. Using commercial off-the-shelf components may compromise quality and accuracy due to several issues including handling the combined data rate from multiple sensors, unknown offset and rate discrepancies between independent hardware clocks, the absence of trigger inputs or -outputs in the hardware, as well as the existence of different methods for time-stamping the recorded data. To achieve accurate synchronisation, we centralise the synchronisation task by recording all trigger or timestamp signals with a multi-channel audio interface. For sensors not having an external trigger signal, we let the computer that captures the sensor data periodically generate timestamp signals from its serial port output. These signals can also be used as a common time base to synchronise multiple asynchronous audio interfaces. The resulted data recording platform, which is built upon two consumer-grade PCs, is capable of capturing 8-bit video data with 1024 × 1024 spatial- and 59.1 Hz temporal resolution, from at least 14 cameras, together with 8 channels of 24-bit audio at 96 kHz and eye-gaze tracking result sampled at a frequency of 60 or 120 Hz. The attained synchronisation accuracy is unprecedented up to date. To facilitate rapid development of readily-applicable MHCI systems using algorithms designed to detect and track behavioural signals (e.g. face detector, facial fiducially points tracker, expression recogniser, etc.), a software integration framework is required. The proposed software framework, which is called the HCI^2 Framework, is built upon publish / subscribe (P/S) architecture. It implements a shared-memory-based data transport protocol for message delivery and a TCP-based system management protocol. The latter ensures that the integrity of system structure is maintained at runtime. With the inclusion of ‘bridging modules’, the HCI^2 Framework is interoperable with other software frameworks including Psyclone and ActiveMQ. In addition to the core communication middleware, we also present the integrated development environment (IDE) of the HCI^2 Framework. It provides a complete graphical environment to support every step in a typical MHCI system development process, including module development, debugging, packaging, and management, as well as the whole system management and testing. The quantitative evaluation indicates that our framework outperforms other similar tools in terms of average message latency and maximum data throughput under a typical single PC scenario. To demonstrate HCI^2 Framework’s capabilities in integrating heterogeneous modules, we present several example modules working with a variety of hardware and software. We also present two use cases of the HCI^2 Framework: a computer game, called CamGame, based on hand-held marker(s) and low-cost camera(s) and the human affective signal analysis component of the Fun Robotic Outdoor Guide (FROG) project ( Using the HCI^2 Framework, we further developed the Mimic-Me Game, which consists of an interactive game played with the NAO humanoid robot. The game involves the robot ‘mimicking’ the player’s facial expression using a combination of body gestures and audio cues. A multimodal dialogue model has been designed and implemented to enable the robot to interact with the human player in a naturalistic way using only natural language, head movement and facial expressions.
Content Version: Open Access
Issue Date: Feb-2014
Date Awarded: Oct-2014
Supervisor: Pantic, Maja
Sponsor/Funder: European Research Council
European Commission
Funder's Grant Number: ERC Starting Grant Agreement ERC-2007-StG-203143 (MAHNOB)
Grant 288235 (FROG)
Department: Computing
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Computing PhD theses

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commonsx