1
IRUS Total
Downloads
  Altmetric

Improving Monocular Depth Estimation using auxiliary information

File Description SizeFormat 
Auty-D-2024-PhD-Thesis.pdfThesis45.99 MBAdobe PDFView/Open
Title: Improving Monocular Depth Estimation using auxiliary information
Authors: Auty, Dylan
Item Type: Thesis or dissertation
Abstract: Monocular Depth Estimation (MDE) is the problem of estimating the distance from the camera to every part of the scene shown in a single input image. It is challenging due to its inherent ambiguity, but Deep Learning (DL) methods perform well, typically by learning to interpret their inputs implicitly by minimising prediction error. However, this approach hides a significant weakness: the model must waste limited capacity, training data, and computation on discovering what parts of the input are important, before it can learn to interpret them. This work addresses this inefficiency, answering the question: how can auxiliary information, sourced from outside the training data, be used to improve MDE performance? Biological depth cues are shown to be useful in a DL context. Then, a novel module is proposed that encourages the model to focus on inter-object relationships. Language models are investigated as a source of object semantics, and are shown to improve performance. A prompt learning technique is proposed that uses a joint vision-and-language model to directly predict depth. The learned prompts are found not to map to depth-related words, implying that human language is inadequate for describing depth. Cross-task knowledge distillation is investigated to provide implicit knowledge to an MDE model. A method is proposed that effectively transfers knowledge from teachers trained for non-MDE tasks, and is extended to a novel teacher-free loss. The proposed methods successfully leverage auxiliary information from biological depth cues, auxiliary semantic models, human language, and the latent space of non-MDE models to improve MDE performance. The success of the various methods presented show that the fully implicit, end-to-end paradigm must be revised to obtain optimal MDE performance, and avenues for possible future work are discussed.
Content Version: Open Access
Issue Date: Feb-2024
Date Awarded: Sep-2024
URI: http://hdl.handle.net/10044/1/115157
DOI: https://doi.org/10.25560/115157
Copyright Statement: Creative Commons Attribution NonCommercial Licence
Supervisor: Mikolajczyk, Krystian
Department: Electrical and Electronic Engineering
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Electrical and Electronic Engineering PhD theses



This item is licensed under a Creative Commons License Creative Commons