165
IRUS Total
Downloads
  Altmetric

Incorporating prior knowledge into deep neural networks without handcrafted features

File Description SizeFormat 
Garnelo_Abellanas-M-2021-PhD-Thesis.pdfThesis25.64 MBAdobe PDFView/Open
Title: Incorporating prior knowledge into deep neural networks without handcrafted features
Authors: Garnelo Abellanas, Marta
Item Type: Thesis or dissertation
Abstract: Deep learning (DL) is currently the largest area of research within artificial intelligence (AI). This success can largely be attributed to the data-driven nature of the DL algorithms themselves: unlike previous approaches in AI which required handcrafting and significant human intervention, DL models can be implemented and trained with little to no human involvement. The lack of handcrafting, however, can be a two-edged sword. DL algorithms are notorious for producing uninterpretable features, generalising badly to new tasks and relying on extraordinarily large datasets for training. In this thesis, on the assumption that these shortcomings are symptoms of the under-constrained training setup of deep networks, we address the question of how to incorporate knowledge into DL algorithms without reverting to complete handcrafting in order to train more data efficient algorithms. % In this thesis we consider different alternatives to this problem. We start by motivating this line of work with an example of a DL architecture which, inspired by symbolic AI, aims at extracting symbols from a simple environment and using those for quickly learning downstream tasks. Our proof-of-concept model shows that it is possible to address some of the data efficiency issues as well as obtaining more interpretable representations by reasoning at this higher level of abstraction. Our second approach for data-efficiency is based on pre-training: the idea is to pre-train some parts of the DL network on a different, but related, task to first learn useful feature extractors. For our experiments we pre-train the encoder of a reinforcement learning agent on a 3D scene prediction task and then use the features produced by the encoder to train a simulated robot arm on a reaching task. Crucially, unlike previous approaches that could only learn from fixed view-points, we are able to train an agent using observations captured from randomly changing positions around the robot arm, without having to train a separate policy for each observation position. Lastly, we focus on how to build in prior knowledge through the choice of dataset. To this end, instead of training DL models on a single dataset, we train them on a distribution over datasets that captures the space of tasks we are interested in. This training regime produces models that can flexibly adapt to any dataset within the distribution at test time. Crucially they only need a small number of observations in order to adapt their predictions, thus addressing the data-efficiency challenge at test time. We call this family of meta-learning models for few-shot prediction Neural Processes (NPs). In addition to successfully learning how to carry out few-shot regression and classification, NPs produce uncertainty estimates and can generate coherent samples at arbitrary resolutions.
Content Version: Open Access
Issue Date: Sep-2020
Date Awarded: Feb-2021
URI: http://hdl.handle.net/10044/1/88706
DOI: https://doi.org/10.25560/88706
Copyright Statement: Creative Commons Attribution NonCommercial NoDerivatives Licence
Supervisor: Shanahan, Murray
Sponsor/Funder: Engineering and Physical Sciences Research Council
DeepMind Technologies (Firm)
Department: Computing
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Computing PhD theses



This item is licensed under a Creative Commons License Creative Commons