Continual reinforcement learning with memory at multiple timescales
File(s)
Author(s)
Kaplanis, Christos
Type
Thesis or dissertation
Abstract
In the past decade, with increased availability of computational resources and several improvements in training techniques, artificial neural networks (ANNs) have been rediscovered as a powerful class of machine learning methods, featuring in several groundbreaking applications of artificial intelligence. Most of these successes have been achieved in stationary, confined domains, such as a game playing and image recognition, but, ultimately, we want to apply artificial intelligence to problems that require it to interact with the real world, which is both vast and nonstationary. Unfortunately, ANNs have long been known to suffer from the phenomenon of catastrophic forgetting, whereby, in a setting where the data distribution is changing over time, new learning can lead to an abrupt erasure of previously acquired knowledge. The resurgence of ANNs has led to an increased urgency to solve this problem and endow them with the capacity for continual learning, which refers to the ability to build on their knowledge over time in environments that are constantly evolving. The most common setting for evaluating continual learning approaches to date has been in the context of training on a number of distinct tasks in sequence, and as a result many of them use the knowledge of task boundaries to consolidate knowledge during training. In the real world, however, the changes to the distribution may occur more gradually and at times that are not known in advance.
The goal of this thesis has been to develop continual learning approaches that can cope with both discrete and continuous changes to the data distribution, without any prior knowledge of the nature or timescale of the changes. I present three new methods, all of which involve learning at multiple timescales, and evaluate them in the context of deep reinforcement learning, a paradigm that combines reinforcement learning with neural networks, which provides a natural testbed for continual learning as (i) it involves interacting with an environment, and (ii) it can feature non-stationarity at unpredictable timescales during training of a single task. The first method is inspired by the process of synaptic consolidation in the brain and involves multi-timescale memory at the level of the parameters of the network; the second extends the first by directly consolidating the agent's policy over time, rather than its individual parameters; finally, the third approach extends the experience replay database, which typically maintains a buffer of the agent's most recent experiences in order to decorrelate them during training, by enabling it to store data over multiple timescales.
The goal of this thesis has been to develop continual learning approaches that can cope with both discrete and continuous changes to the data distribution, without any prior knowledge of the nature or timescale of the changes. I present three new methods, all of which involve learning at multiple timescales, and evaluate them in the context of deep reinforcement learning, a paradigm that combines reinforcement learning with neural networks, which provides a natural testbed for continual learning as (i) it involves interacting with an environment, and (ii) it can feature non-stationarity at unpredictable timescales during training of a single task. The first method is inspired by the process of synaptic consolidation in the brain and involves multi-timescale memory at the level of the parameters of the network; the second extends the first by directly consolidating the agent's policy over time, rather than its individual parameters; finally, the third approach extends the experience replay database, which typically maintains a buffer of the agent's most recent experiences in order to decorrelate them during training, by enabling it to store data over multiple timescales.
Version
Open Access
Date Issued
2020-04
Date Awarded
2020-08
Copyright Statement
Creative Commons Attribution NonCommercial ShareAlike Licence
Advisor
Shanahan, Murray
Clopath, Claudia
Sponsor
Engineering and Physical Sciences Research Council
Publisher Department
Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)