1
IRUS TotalDownloads
Altmetric
Improving Monocular Depth Estimation using auxiliary information
File | Description | Size | Format | |
---|---|---|---|---|
Auty-D-2024-PhD-Thesis.pdf | Thesis | 45.99 MB | Adobe PDF | View/Open |
Title: | Improving Monocular Depth Estimation using auxiliary information |
Authors: | Auty, Dylan |
Item Type: | Thesis or dissertation |
Abstract: | Monocular Depth Estimation (MDE) is the problem of estimating the distance from the camera to every part of the scene shown in a single input image. It is challenging due to its inherent ambiguity, but Deep Learning (DL) methods perform well, typically by learning to interpret their inputs implicitly by minimising prediction error. However, this approach hides a significant weakness: the model must waste limited capacity, training data, and computation on discovering what parts of the input are important, before it can learn to interpret them. This work addresses this inefficiency, answering the question: how can auxiliary information, sourced from outside the training data, be used to improve MDE performance? Biological depth cues are shown to be useful in a DL context. Then, a novel module is proposed that encourages the model to focus on inter-object relationships. Language models are investigated as a source of object semantics, and are shown to improve performance. A prompt learning technique is proposed that uses a joint vision-and-language model to directly predict depth. The learned prompts are found not to map to depth-related words, implying that human language is inadequate for describing depth. Cross-task knowledge distillation is investigated to provide implicit knowledge to an MDE model. A method is proposed that effectively transfers knowledge from teachers trained for non-MDE tasks, and is extended to a novel teacher-free loss. The proposed methods successfully leverage auxiliary information from biological depth cues, auxiliary semantic models, human language, and the latent space of non-MDE models to improve MDE performance. The success of the various methods presented show that the fully implicit, end-to-end paradigm must be revised to obtain optimal MDE performance, and avenues for possible future work are discussed. |
Content Version: | Open Access |
Issue Date: | Feb-2024 |
Date Awarded: | Sep-2024 |
URI: | http://hdl.handle.net/10044/1/115157 |
DOI: | https://doi.org/10.25560/115157 |
Copyright Statement: | Creative Commons Attribution NonCommercial Licence |
Supervisor: | Mikolajczyk, Krystian |
Department: | Electrical and Electronic Engineering |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Electrical and Electronic Engineering PhD theses |
This item is licensed under a Creative Commons License