265
IRUS TotalDownloads
Altmetric
SLAM and deep learning for 3D indoor scene understanding
File | Description | Size | Format | |
---|---|---|---|---|
McCormac-J-2019-PhD-Thesis.pdf | Thesis | 45.5 MB | Adobe PDF | View/Open |
Title: | SLAM and deep learning for 3D indoor scene understanding |
Authors: | McCormac, Brendan John |
Item Type: | Thesis or dissertation |
Abstract: | We build upon research in the fields of Simultaneous Localisation and Mapping (SLAM) and Deep Learning to develop 3D maps of indoor scenes that not only describe where things are but what they are. We focus on real-time online methods suitable for applications such as domestic robotics and augmented reality. While early approaches to SLAM used sparse feature maps for localisation, recent years have seen the advent of real-time dense SLAM systems which enabled applications not possible with only sparse feature maps. Further augmenting dense maps with semantic information will in future enable more intelligent domestic robots and more intuitive human-map interactions not possible with map geometry alone. Early work presented here sought to combine recent advances in semantic segmentation using Convolutional Neural Networks (CNNs) with dense SLAM approaches to produce a semantically annotated dense 3D map. Although we found this combination improved segmentation performance, its inherent limitations subsequently led to a paradigm shift away from semantic annotation towards instance detection and 3D object-level mapping. We propose a new type of SLAM system consisting of discovered object instances that are reconstructed online in individual volumes. We develop a new approach to robustly combine multiple associated 2D instance mask detections into a fused 3D foreground segmentation for each object. The use of individual volumes allows the relative poses of objects to be optimised in a pose-graph, producing a consistent global map that allows objects to be reused on loopy trajectories, and which can improve reconstruction quality. A notable feature of CNNs is their ability to make use of large annotated datasets, and so we also explore methods to reduce the cost of indoor semantic dataset production. We explore SLAM as a means of mitigating labour intensive annotation of video data, but found that producing a large-scale dataset with such an approach would still require significant resources. We therefore explore automated methods to produce a large-scale photorealistic synthetic dataset of indoor trajectories at low cost, and we verify the benefits of the dataset on the task of semantic segmentation. To automate trajectory generation we present a novel two-body random trajectory method that mitigates issues of a completely random approach, and which has subsequently been used in other synthetic indoor datasets. |
Content Version: | Open Access |
Issue Date: | Oct-2018 |
Date Awarded: | Mar-2019 |
URI: | http://hdl.handle.net/10044/1/68466 |
DOI: | https://doi.org/10.25560/68466 |
Copyright Statement: | Creative Commons Attribution NonCommercial NoDerivatives Licence |
Supervisor: | Davison, Andrew Leutenegger, Stefan |
Sponsor/Funder: | James Dyson Foundation |
Funder's Grant Number: | COVIP-P47511 |
Department: | Computing |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Computing PhD theses |