156
IRUS TotalDownloads
Altmetric
Align-and-repeat: one-shot visual imitation learning using imagination
File | Description | Size | Format | |
---|---|---|---|---|
Uzun-M-2022-MPhil-Thesis.pdf | Thesis | 10.6 MB | Adobe PDF | View/Open |
Title: | Align-and-repeat: one-shot visual imitation learning using imagination |
Authors: | Uzun, Murat |
Item Type: | Thesis or dissertation |
Abstract: | Visual imitation learning is a compelling framework that enables robotic agents to perform tasks using expert demonstrations. It is, however, currently too costly to collect high-quality expert behavior, as it requires manual kinesthetic teaching or tele-operation. Furthermore, as deep learning stays at the foundation of raw sensory input processing, it is possible to observe erroneous imitation behavior, especially when the operating conditions are not fully covered by the training data. In this work, we initially study the predictive uncertainty estimations in deep learning comparing the most commonly used methods in the literature from the robotic object manipulation context. Our empirical analysis exposes the involved challenges of uncertainty calibration and out-of-distribution robustness of neural network models. We then propose a practical method for one-shot visual imitation that enables robots to solve a number of everyday table-top manipulation tasks using a single expert demonstration. Our method does not assume any task-relevant prior knowledge and 3D object models. The expert demonstration, i.e., end-effector velocities, is collected together with the corresponding depth measurements using a calibrated wrist-cam. For each task, we divide the robot motion into two phases: the view-point alignment and the interaction. The proposed approach first learns to reconstruct an orthographic render of the workspace via an encoder-decoder type neural scene rendering model that is trained offline. Imagined full top-down render of the live scene, including the unexplored and occluded regions, is then used to guide the end-effector for task-relevant visual information collection. After sufficient visual exploration, the first frame of the live scene is aligned with the first frame of the demonstration scene via estimating a rigid transformation matrix, so that, we can simply repeat the expert actions. This allows executing complex behavior, i.e., 3D trajectory, without explicitly learning a policy. We conduct a number of manipulation experiments in simulation comparing our approach with similar baselines and illustrate its superior performance. |
Content Version: | Open Access |
Issue Date: | Oct-2021 |
Date Awarded: | Apr-2022 |
URI: | http://hdl.handle.net/10044/1/96993 |
DOI: | https://doi.org/10.25560/96993 |
Copyright Statement: | Creative Commons Attribution NonCommercial Licence |
Supervisor: | Johns, Edward |
Sponsor/Funder: | Imperial College London |
Department: | Computing |
Publisher: | Imperial College London |
Qualification Level: | Masters |
Qualification Name: | Master of Philosophy (MPhil) |
Appears in Collections: | Computing PhD theses |
This item is licensed under a Creative Commons License