429
IRUS TotalDownloads
Altmetric
On structural and temporal credit assignment in reinforcement learning
File | Description | Size | Format | |
---|---|---|---|---|
Tavakoli-A-2021-PhD-Thesis.pdf | Thesis | 9.39 MB | Adobe PDF | View/Open |
Title: | On structural and temporal credit assignment in reinforcement learning |
Authors: | Tavakoli, Arash |
Item Type: | Thesis or dissertation |
Abstract: | Reinforcement learning, or learning how to map situations to actions that maximise a numerical reward signal, poses two fundamental interdependent problems: exploration and credit assignment. The exploration problem concerns an agent's ability to discover useful experiences. The credit assignment problem pertains to an agent's ability to incorporate the discovered experiences. The latter comprises two distinct subproblems itself: structural and temporal credit assignment. The structural credit assignment problem involves determining how to assign credit for the outcome of an action to the many component structures, or internal decisions, that could have been involved in producing that action. The temporal credit assignment problem has to do with determining how to assign credit for outcomes of a sequence of experiences to the actions that could have contributed to those outcomes. In this thesis, we broadly study the credit assignment problem in reinforcement learning, making contributions to each of its subproblems in isolation. In the first part of this thesis we address the reinforcement learning problem in environments with multi-dimensional discrete action spaces, a problem setting that plagues structural credit assignment, or generalisation, due to the Bellman's curse of dimensionality. We argue that leveraging the combinatorial structure of such action spaces is crucial for achieving rapid generalisation from limited data. To this end, we introduce two approaches for estimating action values that feature a capacity for leveraging such structures, in each case empirically validating that significant performance improvements in sample complexity can be gained. Furthermore, we demonstrate that our approaches unleash significant benefits concerning space and time complexity, thus allowing them to successfully scale to high-dimensional discrete action spaces where the conventional approach becomes computationally intractable. In the second part of this thesis we address the temporal credit assignment problem. Specifically, we identify and analyse general training scenarios where appropriate temporal credit assignment is hindered by the mishandling of time limits or by the choice of discount factor. To address the first matter, we formalise the ways in which time limits may be interpreted in reinforcement learning and how they should be handled in each case accordingly. To address the second matter, we produce a possible explanation for why the performance of low discount factors tends to fall flat when used in conjunction with function approximation. In turn, this leads us to develop a method that enables a much larger range of discount factors by rectifying the hypothesised root cause. |
Content Version: | Open Access |
Issue Date: | May-2021 |
Date Awarded: | Jan-2022 |
URI: | http://hdl.handle.net/10044/1/95430 |
DOI: | https://doi.org/10.25560/95430 |
Copyright Statement: | Creative Commons Attribution NonCommercial Licence |
Supervisor: | Kormushev, Petar Aurisicchio, Marco |
Sponsor/Funder: | Engineering and Physical Sciences Research Council |
Department: | Dyson School of Design Engineering |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Design Engineering PhD theses |
This item is licensed under a Creative Commons License