Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • About
  • Communities & Collections
  • Advanced Search
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Engineering
  3. Bioengineering
  4. Bioengineering PhD theses
  5. Improving sample efficiency in deep reinforcement learning
 
  • Details
Improving sample efficiency in deep reinforcement learning
File(s)
Dai-T-2022-PhD-Thesis.pdf (12.92 MB)
Thesis
Author(s)
Dai, Tianhong
Type
Thesis
Abstract
Deep reinforcement learning (DRL) has made great progress in dealing with complex control problems in various test scenarios, such as playing video games, playing board games, and dexterous robotic manipulation, with the promise of critical real-world applications, such as controlling plasmas for nuclear fusion. However, DRL requires large amounts of interactions with an environment to find an optimal policy to solve the task, limiting its application in real-world problems. In this thesis, we focus on two aspects to improve sample efficiency in DRL: 1) solving sparse reward tasks and 2) improving general exploration strategies.

First, we analyse the trained agents with and without domain randomisation (DR), a technique that can reduce the reality gap between a simulator and real-world scenarios. Through evaluating their robustness to previous unseen environments and applying both qualitative and quantitative interpretability methods, we provide the insight into the behaviour of trained agents. Finally, some suggestions are also given to researchers who intend to adopt interpretability methods to analyse DRL agents.

Second, we propose two methods to overcome exploration difficulties and improve learning efficiency in goal-oriented RL with the sparse reward setting, where an agent can rarely achieve positive feedback. In the first method, to provide sufficient positive samples for training an agent, hindsight goal relabelling is used to replace goals in original samples with intermediate goals, and these augmented positive samples are leveraged to accelerate the training via a self-imitation learning paradigm. An additional selection module is also designed to remove undesirable modified samples and stabilise training. In the second method, to alleviate the inefficiency of hindsight experience replay (HER) caused by its uniform sampling strategy, a diversity-based sampling method is employed to select valuable and diverse experiences for efficient training.

Furthermore, diversity-augmented intrinsic motivation is introduced to encourage the agent to explore novel states in an environment with sparse or delayed rewards. During training, the diversity of adjacent state sequences is measured under the framework of determinantal point processes (DPPs) and this measurement is used as an auxiliary reward to facilitate the exploration of the agent, thus improving the final performance.
Version
Open Access
Date Issued
2022-05
Date Awarded
2022-09
URI
http://hdl.handle.net/10044/1/100136
DOI
https://doi.org/10.25560/100136
Copyright Statement
Creative Commons Attribution NonCommercial Licence
License URL
Attribution-NonCommercial 4.0 International
Advisor
Bharath, Anil
Publisher Department
Bioengineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback