Bayesian distributional policy gradients
File(s)AAAI21_V1-2disclaimer.pdf (1.31 MB)
Accepted version
Author(s)
Li, Luchen
Faisal, Aldo
Type
Conference Paper
Abstract
Distributional reinforcement learning (Distributional RL)maintains the entire probability distribution of the reward-to-go, i.e. the return, providing a more principled approach to account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous work in distributional RL focused mainly on computing the state-action-return distributions, here we model the state-return distributions. This enables us to translate successful conventional RL algorithms that are based on state values into distributional RL. We formulate the distributional Bell-man operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model re-turn distributions. Our algorithm, BDPG (Bayesian Distributional Policy Gradients), uses adversarial training in joint-contrastive learning to learn a variational posterior from there turns. Moreover, we can now interpret the return prediction uncertainty as an information gain, which allows to obtain anew curiosity measure that helps BDPG steer exploration actively and efficiently. In our experiments, Atari 2600 games and MuJoCo tasks, we demonstrate how BDPG learns generally faster and with higher asymptotic performance than reference distributional RL algorithms, including well known hard exploration tasks.
Date Acceptance
2020-12-02
Citation
Proceedings of the AAAI Conference on Artificial Intelligence
ISSN
2159-5399
Publisher
AAAI
Journal / Book Title
Proceedings of the AAAI Conference on Artificial Intelligence
Copyright Statement
© 2021, Association for the Advancement of Artificial Intelligence.
Source
AAAI Conference on Artificial Intelligence
Publication Status
Accepted
Start Date
2021-02-02
Finish Date
2021-02-09
Coverage Spatial
Vancouver, Canada (Virtual)
Date Publish Online
2021-05-18