Visual reasoning for robotics manipulation
File(s)
Author(s)
Nazarczuk, Michal
Type
Thesis or dissertation
Abstract
This thesis focuses on enabling robotic agents to interact with the surrounding world based on human-given instructions and environmental feedback signals. Embodied Artificial Intelligence is an active research direction trying to solve problems related to understanding and interacting with the surrounding world. While humans seamlessly process multimodal sensory inputs, machines struggle with combining different input modalities. Deep learning models achieved remarkable success in tasks related to single modalities, whereas multimodal approaches are an active area of research. This thesis improves reasoning pipelines by proposing and implementing new multimodal solutions and presenting ways of improving the availability of training data. We enable efficient communication between perception (reasoning) and interaction (manipulation) systems, thus paving the way towards more advanced and sophisticated robots. The first part of the thesis introduces a new benchmark for visual reasoning, which is challenging in both object perception and scene composition. Further, we propose a multimodal approach that includes textual descriptions and visual input to perform reasoning. Moreover, we suggest a new task of Vision to Action that integrates Visual Reasoning and Action Planning through a compound query to an agent. We present a suitable dataset and an approach that provides a sequence of actions given a query. We also propose a system that integrates a reasoning apparatus with an embodied agent in simulated and real environments and evaluate our approach in a real-world scenario. Then, we introduce new simulation environments for interactive embodied reasoning that are focused on high-quality visual observation while preserving a high standard of physics estimation between objects. We propose a modular simulation environment for robotic manipulation along with a dataset and benchmark for multi-step tabletop interactive reasoning and manipulation (including reasoning based on physical properties of objects). We include a closed-loop approach to interactive embodied reasoning with experiments in simulation and real-world scenarios.
Version
Open Access
Date Issued
2023-07
Date Awarded
2024-02
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Mikolajczyk, Krystian
Publisher Department
Department of Electrical and Electronic Engineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)