Perspective taking in robots: A framework and computational model
File(s)
Author(s)
Fischer, Tobias
Type
Thesis or dissertation
Abstract
Humans are inherently social beings that benefit from their perceptional capability to embody another point of view. This thesis examines this capability, termed perspective taking, using a mixed forward/reverse engineering approach. While previous approaches were limited to known, artificial environments, the proposed approach results in a perceptional framework that can be used in unconstrained environments while at the same time detailing the mechanisms that humans use to infer the world's characteristics from another viewpoint.
First, the thesis explores a forward engineering approach by outlining the required perceptional components and implementing these components on a humanoid iCub robot. Prior to and during the perspective taking, the iCub learns the environment and recognizes its constituent objects before approximating the gaze of surrounding humans based on their head poses. Inspired by psychological studies, two separate mechanisms for the two types of perspective taking are employed, one based on line-of-sight tracing and another based on the mental rotation of the environment.
Acknowledging that human head pose is only a rough indication of a human's viewpoint, the thesis introduces a novel, automated approach for ground truth eye gaze annotation. This approach is used to collect a new dataset, which covers a wide range of camera-subject distances, head poses, and gazes. A novel gaze estimation method trained on this dataset outperforms previous methods in close distance scenarios, while going beyond previous methods and also allowing eye gaze estimation in large camera-subject distances that are commonly encountered in human-robot interactions.
Finally, the thesis proposes a computational model as an instantiation of a reverse engineering approach, with the aim of understanding the underlying mechanisms of perspective taking in humans. The model contains a set of forward models as building blocks, and an attentional component to reduce the model's response times. The model is crucial in explaining human data in congruency matching experiments and suggests that humans implement a similar attentional mechanism. Several testable predictions are put forward, including the prediction that forced early responses lead to an egocentric bias. Experimental results on the computational formalization of perspective taking also open up future possibilities of exploring links to other perceptional and cognitive mechanisms, such as active vision and autobiographical memories.
First, the thesis explores a forward engineering approach by outlining the required perceptional components and implementing these components on a humanoid iCub robot. Prior to and during the perspective taking, the iCub learns the environment and recognizes its constituent objects before approximating the gaze of surrounding humans based on their head poses. Inspired by psychological studies, two separate mechanisms for the two types of perspective taking are employed, one based on line-of-sight tracing and another based on the mental rotation of the environment.
Acknowledging that human head pose is only a rough indication of a human's viewpoint, the thesis introduces a novel, automated approach for ground truth eye gaze annotation. This approach is used to collect a new dataset, which covers a wide range of camera-subject distances, head poses, and gazes. A novel gaze estimation method trained on this dataset outperforms previous methods in close distance scenarios, while going beyond previous methods and also allowing eye gaze estimation in large camera-subject distances that are commonly encountered in human-robot interactions.
Finally, the thesis proposes a computational model as an instantiation of a reverse engineering approach, with the aim of understanding the underlying mechanisms of perspective taking in humans. The model contains a set of forward models as building blocks, and an attentional component to reduce the model's response times. The model is crucial in explaining human data in congruency matching experiments and suggests that humans implement a similar attentional mechanism. Several testable predictions are put forward, including the prediction that forced early responses lead to an egocentric bias. Experimental results on the computational formalization of perspective taking also open up future possibilities of exploring links to other perceptional and cognitive mechanisms, such as active vision and autobiographical memories.
Version
Open Access
Date Issued
2018-09
Date Awarded
2019-01
Copyright Statement
Creative Commons Attribution NonCommercial NoDerivatives Licence
Advisor
Demiris, Yiannis
Sponsor
European Commission
Samsung Advanced Institute of Technology
Grant Number
643783-PAL
612139-WYSIWYD
Publisher Department
Electrical and Electronic Engineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)