Efficient exploitation of hierarchical structure in sparse reward reinforcement learning
File(s)AISTATS_HRL (2).pdf (2.34 MB)
Published version
Author(s)
Type
Conference Paper
Abstract
We study goal-conditioned Hierarchical Reinforcement Learning (HRL), where a high-level agent instructs sub-goals to a low-level agent. Under the assumption of a sparse reward function and known hierarchical decomposition,
we propose a new algorithm to learn optimal hierarchical policies. Our algorithm takes a low-level policy as input and is flexible enough to work with a wide range of low-level policies. We show that when the low-level policy is optimistic and provably efficient, our HRL algorithm enjoys a regret bound which represents a significant improvement compared to previous results for HRL. Importantly, our
regret upper bound highlights key characteristics of the hierarchical decomposition that guarantee that our hierarchical algorithm is more efficient than the best monolithic approach. We support our theoretical findings
with experiments that underscore that our method consistently outperforms algorithms that ignore the hierarchical structure.
we propose a new algorithm to learn optimal hierarchical policies. Our algorithm takes a low-level policy as input and is flexible enough to work with a wide range of low-level policies. We show that when the low-level policy is optimistic and provably efficient, our HRL algorithm enjoys a regret bound which represents a significant improvement compared to previous results for HRL. Importantly, our
regret upper bound highlights key characteristics of the hierarchical decomposition that guarantee that our hierarchical algorithm is more efficient than the best monolithic approach. We support our theoretical findings
with experiments that underscore that our method consistently outperforms algorithms that ignore the hierarchical structure.
Date Acceptance
2025-01-21
Publisher
MLResearchPress
Source
International Conference on Artificial Intelligence and Statistics (AISTATS)
Publication Status
Accepted
Start Date
2025-05-03
Finish Date
2025-05-05
Coverage Spatial
Mai Khao, Thailand