Reinforcement learning with dynamic convex risk measures
Author(s)
Coache, Anthony
Jaimungal, Sebastian
Type
Journal Article
Abstract
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.
Date Issued
2024-04
Date Acceptance
2023-03-14
Citation
Mathematical Finance, 2024, 34 (2), pp.557-587
ISSN
0960-1627
Publisher
Wiley
Start Page
557
End Page
587
Journal / Book Title
Mathematical Finance
Volume
34
Issue
2
Copyright Statement
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits
use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or
adaptations are made.
© 2023 The Authors. Mathematical Finance published by Wiley Periodicals LLC.
use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or
adaptations are made.
© 2023 The Authors. Mathematical Finance published by Wiley Periodicals LLC.
Identifier
http://dx.doi.org/10.1111/mafi.12388
Publication Status
Published
Date Publish Online
2023-04-17