1
IRUS TotalDownloads
Altmetric
3D Vision for improved scene understanding in robotic surgery
File | Description | Size | Format | |
---|---|---|---|---|
Tukra-S-2024-PhD-Thesis.pdf | Thesis | 74.87 MB | Adobe PDF | View/Open |
Title: | 3D Vision for improved scene understanding in robotic surgery |
Authors: | Tukra, Samyakh |
Item Type: | Thesis or dissertation |
Abstract: | Robotics and Mixed Reality (MR) visualisation are revolutionising Minimally Invasive Surgery (MIS), enhancing precision and efficiency. These technologies promise better patient outcomes, shorter recovery times, and reduced complications. A key application is guiding tumor resection, where MR can overlay tissue characteristics from intra- and pre-operative sensors as dynamic holograms on anatomical structures, enhancing the surgeon’s view. However, its clinical adoption has been slow due to challenges with real-time autonomy, robustness against moving objects, and soft-tissue deformation. Addressing these issues requires accurate, occlusion-aware, geometrically and temporally consistent depth estimation in complex, dynamic environments. In this thesis, I tackle four key challenges: (1) occlusions causing ambiguous depth, (2) sub-millimeter depth accuracy, (3) dynamic motion, and (4) temporal inconsistency. I propose Deep Learning (DL) solutions focused on self-supervised learning and generative modeling, circumventing the need for scarce ground truth data in surgery. First, I developed a bespoke video generative model that removes occlusions from a video and generates new frames with high spatial fidelity. To enhance depth estimation accuracy, I designed a multi-scale perceptual loss function that achieves State-of-the-Art performance even with a randomly connected neural network, surpassing many stereo models. I further adapted this approach to a stereo depth estimation model that achieves sub-millimeter accuracy without surgical training data. For dynamic motion and temporal consistency, I introduced a novel method that trains two models: (i) a Multi-Layer Perceptron (MLP) for 3D scene-flow prediction and (ii) a video depth estimation model that learns temporal dynamics via Diffusion. These models optimise each other to ensure temporal consistency in depth estimation, even in highly dynamic environments. Finally, I present an MLOps pipeline for deploying DL models on MR hardware, such as the Microsoft HoloLens, via the Cloud for clinical use. |
Content Version: | Open Access |
Issue Date: | May-2023 |
Date Awarded: | Oct-2024 |
URI: | http://hdl.handle.net/10044/1/115572 |
DOI: | https://doi.org/10.25560/115572 |
Copyright Statement: | Creative Commons Attribution NonCommercial Licence |
Supervisor: | Giannarou, Stamatia |
Sponsor/Funder: | Royal Society (Great Britain) National Institute for Health Research (Great Britain) |
Funder's Grant Number: | UF140290 RGF\EA\180084 |
Department: | Department of Surgery and Cancer |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Department of Surgery and Cancer PhD Theses |
This item is licensed under a Creative Commons License