Scaling all-goals updates in reinforcement learning using convolutional neural networks
File(s)Pardo_AAAI-2020.pdf (985.47 KB)
Accepted version
Author(s)
Pardo, Fabio
Levdik, Vitaly
Kormushev, Petar
Type
Conference Paper
Abstract
Being able to reach any desired location in the environmentcan be a valuable asset for an agent. Learning a policy to nav-igate between all pairs of states individually is often not fea-sible. Anall-goals updatingalgorithm uses each transitionto learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallellimited the approach to small tabular cases so far. To tacklethis problem we propose to use convolutional network archi-tectures to generate Q-values and updates for a large numberof goals at once. We demonstrate the accuracy and generaliza-tion qualities of the proposed method on randomly generatedmazes and Sokoban puzzles. In the case of on-screen goalcoordinates the resulting mapping from frames todistance-mapsdirectly informs the agent about which places are reach-able and in how many steps. As an example of applicationwe show that replacing the random actions inε-greedy ex-ploration by several actions towards feasible goals generatesbetter exploratory trajectories on Montezuma’s Revenge andSuper Mario All-Stars games.
Date Issued
2020-02-01
Date Acceptance
2019-11-25
Citation
Proc. 34th AAAI Conference on Artificial Intelligence (AAAI 2020), 2020, 34 (4), pp.5355-5362
ISSN
2374-3468
Publisher
Association for the Advancement of Artificial Intelligence
Start Page
5355
End Page
5362
Journal / Book Title
Proc. 34th AAAI Conference on Artificial Intelligence (AAAI 2020)
Volume
34
Issue
4
Copyright Statement
© 2020, Association for the Advancement of Artificial Intelligence.
Identifier
http://kormushev.com/papers/Pardo_AAAI-2020.pdf
Source
34th AAAI Conference on Artificial Intelligence (AAAI 2020)
Publication Status
Published
Start Date
2020-02-07
Finish Date
2020-02-12
Coverage Spatial
New York, USA