Towards better data efficiency in deep reinforcement learning

Dilokthanakul, Nat

332

IRUS Total
Downloads

Altmetric

Towards better data efficiency in deep reinforcement learning

File	Description	Size	Format
Dilokthanakul-N-2019-PhD-Thesis.pdf	Thesis	9.26 MB	Adobe PDF	View/Open

Title:	Towards better data efficiency in deep reinforcement learning
Authors:	Dilokthanakul, Nat
Item Type:	Thesis or dissertation
Abstract:	Deep Reinforcement Learning (DRL) is a machine learning paradigm which uses deep neural networks as one of its main components to search for reward-directed behaviours. Although DRL has been successful in many high-dimensional and difficult tasks, there are several remaining challenges in bridging the gap between human-level learning ability and DRL. One of its weaknesses is the data-hungry nature which makes it impractical in real-world scenarios. In this thesis, three main causes of data inefficiency in DRL are explored: (i) the sparse reward problem, (ii) the exploration problem and (iii) the representation problem. Towards solving these problems, a suite of proposed algorithms and models are studied: (i) The first proposed method is a hierarchical model with two types of intrinsic motivations: feature-control and pixel-control. The models with these intrinsic motivations have been evaluated to be effective in sparse reward tasks. An empirical study has also suggested that the successes in the sparse reward problem come from extra training signals that originate from the intrinsic rewards. (ii) Next, an exploration strategy based on the optimism in the face of uncertainty (OFU) principle is proposed. In this method, the uncertainty of interest is the uncertainty on the return, which is relatively easy to measure. Here, experiments have shown that the method works well in Montezuma's Revenge, a notoriously difficult exploration game. In addition, weaknesses of the method such as potential sub-optimal behaviours in a stochastic environment is also discussed. (iii) Deep neural networks are known to exhibit the forgetting problem during learning, which demonstrates its inefficiency as a representational model. This study aims at understanding the relationship between neural network architectures and their forgetting behaviours which leads to poor generalisability and data inefficiency. It has been found that specific weight sharing structures can be used to moderately alleviate the forgetting problem. (iv) In order to move towards more generalisable representations in DRL, disentangled representation learning models present themselves as a promising candidate. A deep generative model, namely GMVAE, that represents data with both discrete and continuous variables has been proposed as a potential method to achieve generalisable representation. A study of the model in a digit dataset has revealed that it successfully learns interpretable categorical grouping and meaningful continuous variables. Major problems associated with the training of such a model are also discussed. (v) Additionally, a framework for adding inductive biases in a generative model is proposed. This framework has been shown to create latent variable models that are able to disentangle local and global information in image datasets. This framework provides an additional method for creating a latent variable model with explicit information placement in the latent variables. Finally, the thesis is concluded with reviews over related works in the field and suggests future directions that could help refine a solution to the data-inefficiency problem in DRL.
Content Version:	Open Access
Issue Date:	Oct-2018
Date Awarded:	Feb-2019
URI:	http://hdl.handle.net/10044/1/67778
DOI:	https://doi.org/10.25560/67778
Copyright Statement:	Creative Commons Attribution NonCommercial Licence
Supervisor:	Shanahan, Murray Deisenroth, Marc
Sponsor/Funder:	Royal Thai Scholarship
Department:	Computing
Publisher:	Imperial College London
Qualification Level:	Doctoral
Qualification Name:	Doctor of Philosophy (PhD)
Appears in Collections:	Computing PhD theses