18
IRUS Total
Downloads
  Altmetric

Parameter Space Abstractions for Diversity-based Policy Search

File Description SizeFormat 
Rakicevic-N-2022-PhD-Thesis.pdfThesis41.51 MBAdobe PDFView/Open
Title: Parameter Space Abstractions for Diversity-based Policy Search
Authors: Rakicevic, Nemanja
Item Type: Thesis or dissertation
Abstract: Many modern systems can be modelled as sequential decision making processes, such as autonomous robots, video game bots, cooling plant control systems, financial trading algorithms, and others. These systems consist of an autonomous agent, making decisions about which actions to execute in an environment. An essential component of the sequential decision making processes, is the agent’s policy which defines how the agent makes decisions in every situation. The policy can to be manually defined, or learned from experience, in order to make the agent fully autonomous. In certain cases where the environment dynamics change dramatically, due to moving obstacles or partial agent damage, a single policy may not be sufficient. Therefore, maintaining a diversity of policies is necessary to provide alternatives for the system to function normally. Prior work in the fields of reinforcement learning and quality-diversity shows that a diversity of policies can be learned from interaction experience. Diversity can be achieved either through a single task-conditioned policy or by maintaining a collection of policies. The main limitation of these approaches is that they tend to be sample-inefficient, because they require a large amount of interaction data and because they perform search in a high-dimensional policy parameter space. This thesis presents a novel perspective on diversity-based policy search. The novelty is to model abstractions of the policy parameter space in order to improve the diversity-based policy search. Abstractions are another representation of the search space that offer certain characteristics useful for the policy search process. This topic is split into two parts. The first part of the thesis focuses on approaches towards modelling abstractions over the movement policy parameterisation space, in order to improve policy search in the original policy parameterisation space, in a task-agnostic setting. The abstractions are implemented as forward models, which map the movement policy parameterisation space to the trial outcomes achieved by the corresponding policy. The properties of the learned forward model are used to iteratively guide the parameter search process, according to the active learning framework, which leads to the model’s further improvement, without using any additional task information. The second part of the thesis focuses on approaches to modelling abstractions of the policy parameter space, in the context of policies parameterised as neural networks, and subsequently performing diversity-based policy search in the abstracted parameter space. The main goal is to reduce the high-dimensional neural network parameters search space, by investigating the manifold hypothesis within the policy parameter space. The manifold hypothesis states that there exists a low-dimensional manifold embedded in the high-dimensional policy parameter space around which a high-density of solutions can be found. The learned manifold maps, i.e. policy parameter representations, are used for diversity-based policy search. Several insights are provided into the properties or the policy parameter space which can be used to improve the policy search, including the factors affecting the final representation quality. The parameter dataset generation process and the manifold learning model used to define the policy parameter representations, are thoroughly examined to gain a deeper understanding of parameter space representation learning process. The proposed approaches were evaluated in simulated environments and real robot setups, showing superior performance compared to baseline and state of the art methods. The main insights of the thesis show that properly modelled abstractions of the parameter search space can help improve the diversity-based policy search. The novel insights shown in this thesis open up future research avenues for policy search approaches in learned manifolds as well as potential applications to neural network optimisation.
Content Version: Open Access
Issue Date: Jun-2021
Date Awarded: Apr-2022
URI: http://hdl.handle.net/10044/1/96985
DOI: https://doi.org/10.25560/96985
Copyright Statement: Creative Commons Attribution NonCommercial Licence
Supervisor: Kormushev, Petar
Childs, Peter
Department: Dyson School of Design Engineering
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Design Engineering PhD theses



This item is licensed under a Creative Commons License Creative Commons