Reinforcement learning for proactive content caching in wireless networks
File(s)
Author(s)
Somuyiwa, Samuel Olusegun
Type
Thesis or dissertation
Abstract
Proactive content caching (PC) at the edge of wireless networks, that is, at the base stations (BSs) and/or user equipments (UEs), is a promising strategy to successfully handle the ever-growing mobile data traffic and to improve the quality-of-service for content delivery over wireless networks. However, factors such as limitations in storage capacity, time-variations in wireless channel conditions as well as in content demand profile pose challenges that need to be addressed in order to realise the benefits of PC
at the wireless edge.
This thesis aims to develop PC solutions that address these challenges. We consider PC directly at UEs equipped with finite capacity cache memories. This consideration is done within the framework of a dynamic system, where mobile users randomly request contents from a non-stationary content library; new contents are added to the library over time and each content may remain in the library for a random lifetime
within which it may be requested. Contents are delivered through wireless channels with time-varying quality, and any time contents are transmitted, a transmission cost associated with the number of bits downloaded and the channel quality of the receiving user(s) at that time is incurred by the system. We formulate each considered problem as a Markov decision process with the objective of minimising the long
term expected average cost on the system. We then use reinforcement learning (RL) to solve this highly challenging problem with a prohibitively large state and action spaces. In particular, we employ policy approximation techniques for compact representation of complex policy structures, and policy gradient RL methods to train the system. In a single-user problem setting that we consider, we show the optimality of a
threshold-based PC scheme that is adaptive to system dynamics. We use this result to characterise and design a multicast-aware PC scheme, based on deep RL framework, when we consider a multi-user problem setting. We perform extensive numerical simulations of the schemes we propose. Our results show not only significant improvements against the state-of-the-art reactive content delivery approaches, but also near-optimality of the proposed RL solutions based on comparisons with some lower bounds.
at the wireless edge.
This thesis aims to develop PC solutions that address these challenges. We consider PC directly at UEs equipped with finite capacity cache memories. This consideration is done within the framework of a dynamic system, where mobile users randomly request contents from a non-stationary content library; new contents are added to the library over time and each content may remain in the library for a random lifetime
within which it may be requested. Contents are delivered through wireless channels with time-varying quality, and any time contents are transmitted, a transmission cost associated with the number of bits downloaded and the channel quality of the receiving user(s) at that time is incurred by the system. We formulate each considered problem as a Markov decision process with the objective of minimising the long
term expected average cost on the system. We then use reinforcement learning (RL) to solve this highly challenging problem with a prohibitively large state and action spaces. In particular, we employ policy approximation techniques for compact representation of complex policy structures, and policy gradient RL methods to train the system. In a single-user problem setting that we consider, we show the optimality of a
threshold-based PC scheme that is adaptive to system dynamics. We use this result to characterise and design a multicast-aware PC scheme, based on deep RL framework, when we consider a multi-user problem setting. We perform extensive numerical simulations of the schemes we propose. Our results show not only significant improvements against the state-of-the-art reactive content delivery approaches, but also near-optimality of the proposed RL solutions based on comparisons with some lower bounds.
Version
Open Access
Date Issued
2019-05
Online Publication Date
2021-01-31T00:01:33Z
2021-02-26T15:10:45Z
Date Awarded
2020-02
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Gunduz, Deniz
Gyorgy, Denes
Sponsor
Petroleum Technology Development Fund
Publisher Department
Electrical and Electronic Engineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)