Quantile Markov decision process

Publication available at: https://arxiv.org/abs/1711.05788
Title: Quantile Markov decision process
Authors: Li, X
Zhong, H
Brandeau, M
Item Type: Working Paper
Abstract: The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process(MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk(CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment.
Issue Date: 4-Aug-2020
URI: http://hdl.handle.net/10044/1/88353
ISSN: 0030-364X
Publisher: Institute for Operations Research and Management Sciences
Keywords: Operations Research
0102 Applied Mathematics
0802 Computation Theory and Mathematics
1503 Business and Management
Publication Status: Published
Open Access location: https://arxiv.org/abs/1711.05788
Appears in Collections:Imperial College Business School

Unless otherwise indicated, items in Spiral are protected by copyright and are licensed under a Creative Commons Attribution NonCommercial NoDerivatives License.

Creative Commons