Action branching architectures for deep reinforcement learning
File(s)Tavakoli_AAAI-2018[1].pdf (3.54 MB)
Accepted version
OA Location
Author(s)
Tavakoli, Arash
Pardo, Fabio
Kormushev, Petar
Type
Conference Paper
Abstract
Discrete-action algorithms have been central to numerous
recent successes of deep reinforcement learning. However,
applying these algorithms to high-dimensional action tasks
requires tackling the combinatorial increase of the number
of possible actions with the number of action dimensions.
This problem is further exacerbated for continuous-action
tasks that require fine control of actions via discretization.
In this paper, we propose a novel neural architecture fea-
turing a shared decision module followed by several net-
work
branches
, one for each action dimension. This approach
achieves a linear increase of the number of network outputs
with the number of degrees of freedom by allowing a level of
independence for each individual action dimension. To illus-
trate the approach, we present a novel agent, called Branch-
ing Dueling Q-Network (BDQ), as a branching variant of
the Dueling Double Deep Q-Network (Dueling DDQN). We
evaluate the performance of our agent on a set of challeng-
ing continuous control tasks. The empirical results show that
the proposed agent scales gracefully to environments with in-
creasing action dimensionality and indicate the significance
of the shared decision module in coordination of the dis-
tributed action branches. Furthermore, we show that the pro-
posed agent performs competitively against a state-of-the-
art continuous control algorithm, Deep Deterministic Policy
Gradient (DDPG).
recent successes of deep reinforcement learning. However,
applying these algorithms to high-dimensional action tasks
requires tackling the combinatorial increase of the number
of possible actions with the number of action dimensions.
This problem is further exacerbated for continuous-action
tasks that require fine control of actions via discretization.
In this paper, we propose a novel neural architecture fea-
turing a shared decision module followed by several net-
work
branches
, one for each action dimension. This approach
achieves a linear increase of the number of network outputs
with the number of degrees of freedom by allowing a level of
independence for each individual action dimension. To illus-
trate the approach, we present a novel agent, called Branch-
ing Dueling Q-Network (BDQ), as a branching variant of
the Dueling Double Deep Q-Network (Dueling DDQN). We
evaluate the performance of our agent on a set of challeng-
ing continuous control tasks. The empirical results show that
the proposed agent scales gracefully to environments with in-
creasing action dimensionality and indicate the significance
of the shared decision module in coordination of the dis-
tributed action branches. Furthermore, we show that the pro-
posed agent performs competitively against a state-of-the-
art continuous control algorithm, Deep Deterministic Policy
Gradient (DDPG).
Date Issued
2018-02-02
Date Acceptance
2017-11-09
Citation
Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 2018
Publisher
AAAI
Journal / Book Title
Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018)
Copyright Statement
© 2018, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Intelligence (www.aaai.org). All rights reserved.
Identifier
http://kormushev.com/papers/Tavakoli_AAAI-2018.pdf
Source
AAAI 2018
Subjects
cs.LG
cs.AI
Publication Status
Published
Start Date
2018-02-02
Finish Date
2018-02-07
Coverage Spatial
New Orleans, LA, USA