1
IRUS Total
Downloads
  Altmetric

3D Vision for improved scene understanding in robotic surgery

File Description SizeFormat 
Tukra-S-2024-PhD-Thesis.pdfThesis74.87 MBAdobe PDFView/Open
Title: 3D Vision for improved scene understanding in robotic surgery
Authors: Tukra, Samyakh
Item Type: Thesis or dissertation
Abstract: Robotics and Mixed Reality (MR) visualisation are revolutionising Minimally Invasive Surgery (MIS), enhancing precision and efficiency. These technologies promise better patient outcomes, shorter recovery times, and reduced complications. A key application is guiding tumor resection, where MR can overlay tissue characteristics from intra- and pre-operative sensors as dynamic holograms on anatomical structures, enhancing the surgeon’s view. However, its clinical adoption has been slow due to challenges with real-time autonomy, robustness against moving objects, and soft-tissue deformation. Addressing these issues requires accurate, occlusion-aware, geometrically and temporally consistent depth estimation in complex, dynamic environments. In this thesis, I tackle four key challenges: (1) occlusions causing ambiguous depth, (2) sub-millimeter depth accuracy, (3) dynamic motion, and (4) temporal inconsistency. I propose Deep Learning (DL) solutions focused on self-supervised learning and generative modeling, circumventing the need for scarce ground truth data in surgery. First, I developed a bespoke video generative model that removes occlusions from a video and generates new frames with high spatial fidelity. To enhance depth estimation accuracy, I designed a multi-scale perceptual loss function that achieves State-of-the-Art performance even with a randomly connected neural network, surpassing many stereo models. I further adapted this approach to a stereo depth estimation model that achieves sub-millimeter accuracy without surgical training data. For dynamic motion and temporal consistency, I introduced a novel method that trains two models: (i) a Multi-Layer Perceptron (MLP) for 3D scene-flow prediction and (ii) a video depth estimation model that learns temporal dynamics via Diffusion. These models optimise each other to ensure temporal consistency in depth estimation, even in highly dynamic environments. Finally, I present an MLOps pipeline for deploying DL models on MR hardware, such as the Microsoft HoloLens, via the Cloud for clinical use.
Content Version: Open Access
Issue Date: May-2023
Date Awarded: Oct-2024
URI: http://hdl.handle.net/10044/1/115572
DOI: https://doi.org/10.25560/115572
Copyright Statement: Creative Commons Attribution NonCommercial Licence
Supervisor: Giannarou, Stamatia
Sponsor/Funder: Royal Society (Great Britain)
National Institute for Health Research (Great Britain)
Funder's Grant Number: UF140290
RGF\EA\180084
Department: Department of Surgery and Cancer
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Department of Surgery and Cancer PhD Theses



This item is licensed under a Creative Commons License Creative Commons