Data-efficient reinforcement learning with probabilistic model predictive control
File(s)kamthe18a.pdf (515.49 KB)
Published version
Author(s)
Kamthe, Sanket
Deisenroth, Marc Peter
Type
Conference Paper
Abstract
Trial-and-error based reinforcement learning
(RL) has seen rapid advancements in recent
times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A
large number of interactions may be impractical in many real-world applications, such as
robotics, and many practical systems have to
obey limitations in the form of state space
or control constraints. To reduce the number
of system interactions while simultaneously
handling constraints, we propose a modelbased RL framework based on probabilistic
Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs)
to incorporate model uncertainty into longterm predictions, thereby, reducing the impact of model errors. We then use MPC to
find a control sequence that minimises the
expected long-term cost. We provide theoretical guarantees for first-order optimality in
the GP-based transition models with deterministic approximate inference for long-term
planning. We demonstrate that our approach
does not only achieve state-of-the-art data
efficiency, but also is a principled way for RL
in constrained environments.
(RL) has seen rapid advancements in recent
times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A
large number of interactions may be impractical in many real-world applications, such as
robotics, and many practical systems have to
obey limitations in the form of state space
or control constraints. To reduce the number
of system interactions while simultaneously
handling constraints, we propose a modelbased RL framework based on probabilistic
Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs)
to incorporate model uncertainty into longterm predictions, thereby, reducing the impact of model errors. We then use MPC to
find a control sequence that minimises the
expected long-term cost. We provide theoretical guarantees for first-order optimality in
the GP-based transition models with deterministic approximate inference for long-term
planning. We demonstrate that our approach
does not only achieve state-of-the-art data
efficiency, but also is a principled way for RL
in constrained environments.
Editor(s)
Storkey, Amos J
Pérez-Cruz, Fernando
Date Issued
2018-04-15
Date Acceptance
2018-01-08
Citation
Proceedings of Machine Learning Research, 2018, 84, pp.1701-1710
Publisher
PMLR
Start Page
1701
End Page
1710
Journal / Book Title
Proceedings of Machine Learning Research
Volume
84
Copyright Statement
© 2018 by the author(s).
Identifier
http://proceedings.mlr.press/v84/
Source
Artificial Intelligence and Statistics
Start Date
2018-04-09
Finish Date
2018-04-11
Coverage Spatial
Lanzarote, Canary Islands
OA Location
https://arxiv.org/pdf/1706.06491
Date Publish Online
2018-04-15