DCOPS and bandits: Exploration and exploitation in decentralised coordination
OA Location
Author(s)
Stranders, Ruben
Tran-Thanh, Long
Fave, Francesco Maria Delle
Rogers, Alex
Jennings, Nick
Type
Conference Paper
Abstract
Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB?DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB?DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the a priori unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon. We propose Heist, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. Heist solves MAB?DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that Heist outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB?DCOPs in experimental settings.
Date Issued
2012
Citation
2012, pp.289-297
Publisher
Curran
Start Page
289
End Page
297
Identifier
http://eprints.soton.ac.uk/273086/
Source
Proc. 11th Int. Conference on Autonomous Agents and Multi-Agent Systems (AAMAS)
Publication Status
Unpublished