Scene understanding for 3D multi-object scenes: labelling, reasoning and decomposing
File(s)
Author(s)
Landgraf, Zoe
Type
Thesis
Abstract
Scene understanding is a crucial component for intelligent computer vision systems, which are an integral part of most robotic devices, including augmented reality glasses, autonomous cars and future household robots. With the tremendous progress computer vision methods have experienced since the introduction of machine learning and in particular, deep learning techniques to the field, intelligent vision systems seem to be within reach. Some successful applications such as face detection and augmentation have already found their way into our smartphones. However, many challenges remain unsolved, especially in the 3D domain. This thesis addresses three areas of 3D scene understanding with a focus on scenes composed of multiple small objects, which can pose particular challenges and are overall less explored. \textit{Labelling} In a first study, two common deep learning based methods to add semantic labels to a 3D reconstruction in real-time SLAM are compared and evaluated, providing valuable insights on the settings which favour one or the other approach. The experiments are conducted for a table-top setting of small objects and under the assumption, that a mobile device can extensively scan the scene. \textit{Reasoning} However, such an ideal setting is not always possible and the second challenge addressed in this thesis, is that of estimating the content of multi-object scenes when only one or few views are available. To this end, a novel, generative neural network architecture is proposed which can generate the full 3D shape and instance segmentation of a scene from a single depth image.
\textit{Decomposing} The method proposed in the second study encodes all features jointly, which leaves no room for scene manipulation within the representation itself. The last experiments explore the question on how to best represent 3D scenes in a compositional way. Compositional, object-level encodings have several advantages including the control over individual features and objects within a scene, the ability to produce novel compositions, as well as a suitable structure to model interactions between scene components. To this end, a novel method to generate a factorised latent representation is proposed, which encodes a scene into a set of per-object latent codes. The proposed method is able to decompose a scene of convex shapes into its components, even from a single viewpoint. We hope that the research results of this thesis can provide insights for solving some of the open problems of 3D scene understanding and that the proposed solutions are relevant for future work and applications.
\textit{Decomposing} The method proposed in the second study encodes all features jointly, which leaves no room for scene manipulation within the representation itself. The last experiments explore the question on how to best represent 3D scenes in a compositional way. Compositional, object-level encodings have several advantages including the control over individual features and objects within a scene, the ability to produce novel compositions, as well as a suitable structure to model interactions between scene components. To this end, a novel method to generate a factorised latent representation is proposed, which encodes a scene into a set of per-object latent codes. The proposed method is able to decompose a scene of convex shapes into its components, even from a single viewpoint. We hope that the research results of this thesis can provide insights for solving some of the open problems of 3D scene understanding and that the proposed solutions are relevant for future work and applications.
Version
Open Access
Date Issued
2022-04
Date Awarded
2023-03
Copyright Statement
Creative Commons Attribution NonCommercial Licence
License URL
Advisor
Davison, Andrew, Stefan Leutenegger
Sponsor
Dyson
Publisher Department
Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)