Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • Communities & Collections
  • Research Outputs
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Natural Sciences
  3. Mathematics
  4. Statistics
  5. Optimal convergence rate for exact policy mirror descent in discounted Markov decision processes
 
  • Details
Optimal convergence rate for exact policy mirror descent in discounted Markov decision processes
File(s)
convergence_pmd.pdf (424.32 KB)
Published version
Author(s)
Johnson, Emmeran
Pike-Burke, Ciara
Rebeschini, Patrick
Type
Conference Paper
Abstract
Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, PMD
algorithmically regularises the policy improvement step of PI. With exact policy evaluation, PI is known to converge linearly with a rate given by the discount factor γ of a Markov Decision Process. In this work, we bridge the gap between PI and PMD with exact policy evaluation and show that the dimension-free γ-rate of PI can be achieved by the general family of unregularised PMD algorithms under
an adaptive step-size. We show that both the rate and step-size are unimprovable for PMD: we provide matching lower bounds that demonstrate that the γ-rate is optimal for PMD methods as well as PI, and that the adaptive step-size is necessary for PMD to achieve it. Our work is the first to relate PMD to rate-optimality and step-size necessity. Our study of the convergence of PMD avoids the use of the
performance difference lemma, which leads to a direct analysis of independent interest. We also extend the analysis to the inexact setting and establish the first
dimension-optimal sample complexity for unregularised PMD under a generative model, improving upon the best-known result.
Date Issued
2023
Date Acceptance
2023-09-21
Citation
Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023, 36, pp.76496-76524
URI
http://hdl.handle.net/10044/1/107348
URL
https://proceedings.neurips.cc/paper_files/paper/2023/hash/f0d7b528c31bc3f9a0d5bab515ed6ed5-Abstract-Conference.html
Publisher
Curran Associates, Inc.
Start Page
76496
End Page
76524
Journal / Book Title
Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
Volume
36
Copyright Statement
Copyright © 2023 The Author(s).
Identifier
https://proceedings.neurips.cc/paper_files/paper/2023/hash/f0d7b528c31bc3f9a0d5bab515ed6ed5-Abstract-Conference.html
Source
Neural Information Processing Systems (NeurIPS 2023)
Publication Status
Published
Start Date
2023-12-11
Finish Date
2023-12-16
Coverage Spatial
New Orleans, LA, USA
Date Publish Online
2023
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback