156
IRUS Total
Downloads
  Altmetric

Align-and-repeat: one-shot visual imitation learning using imagination

File Description SizeFormat 
Uzun-M-2022-MPhil-Thesis.pdfThesis10.6 MBAdobe PDFView/Open
Title: Align-and-repeat: one-shot visual imitation learning using imagination
Authors: Uzun, Murat
Item Type: Thesis or dissertation
Abstract: Visual imitation learning is a compelling framework that enables robotic agents to perform tasks using expert demonstrations. It is, however, currently too costly to collect high-quality expert behavior, as it requires manual kinesthetic teaching or tele-operation. Furthermore, as deep learning stays at the foundation of raw sensory input processing, it is possible to observe erroneous imitation behavior, especially when the operating conditions are not fully covered by the training data. In this work, we initially study the predictive uncertainty estimations in deep learning comparing the most commonly used methods in the literature from the robotic object manipulation context. Our empirical analysis exposes the involved challenges of uncertainty calibration and out-of-distribution robustness of neural network models. We then propose a practical method for one-shot visual imitation that enables robots to solve a number of everyday table-top manipulation tasks using a single expert demonstration. Our method does not assume any task-relevant prior knowledge and 3D object models. The expert demonstration, i.e., end-effector velocities, is collected together with the corresponding depth measurements using a calibrated wrist-cam. For each task, we divide the robot motion into two phases: the view-point alignment and the interaction. The proposed approach first learns to reconstruct an orthographic render of the workspace via an encoder-decoder type neural scene rendering model that is trained offline. Imagined full top-down render of the live scene, including the unexplored and occluded regions, is then used to guide the end-effector for task-relevant visual information collection. After sufficient visual exploration, the first frame of the live scene is aligned with the first frame of the demonstration scene via estimating a rigid transformation matrix, so that, we can simply repeat the expert actions. This allows executing complex behavior, i.e., 3D trajectory, without explicitly learning a policy. We conduct a number of manipulation experiments in simulation comparing our approach with similar baselines and illustrate its superior performance.
Content Version: Open Access
Issue Date: Oct-2021
Date Awarded: Apr-2022
URI: http://hdl.handle.net/10044/1/96993
DOI: https://doi.org/10.25560/96993
Copyright Statement: Creative Commons Attribution NonCommercial Licence
Supervisor: Johns, Edward
Sponsor/Funder: Imperial College London
Department: Computing
Publisher: Imperial College London
Qualification Level: Masters
Qualification Name: Master of Philosophy (MPhil)
Appears in Collections:Computing PhD theses



This item is licensed under a Creative Commons License Creative Commons