Efficient, secure and private distributed computation and machine learning
File(s)
Author(s)
Hasircioglu, Burak
Type
Thesis or dissertation
Abstract
Distributed computation is the process of breaking down computational tasks and executing them across multiple nodes, rather than relying on a single machine. This approach is essential in the age of big data, where the large volume and variety of data can overwhelm individual machines. However, preserving privacy remains a critical concern in distributed computation, as there is a risk that adversarial nodes may attempt further analysis of the data to learn about the individuals owning the data. Additionally, the involvement of multiple nodes in the process can lead to arbitrary delays in computation if any of the nodes’ service is unreliable. This dissertation aims to address specific research problems in the areas of efficiency, security, and privacy in distributed computation and federated learning.
The research presented in this dissertation is two-fold. First, we delve into the realm of coded computation, a framework that has been extensively used to address reliability and security issues in distributed computation. Our focus is on proposing a novel coding scheme called bivariate polynomial codes that is designed to address two major problems in distributed matrix multiplication, namely, straggler mitigation and data security. Unlike prior polynomial coding schemes, our scheme efficiently utilizes all worker nodes, including stragglers, in a storage- and upload-cost-efficient manner.
Then, we shift our focus to private federated learning, which is a technique allowing for private and collaborative training of models on decentralized data sources without requiring to transfer the data to a central location. We first investigate privacy amplification via client sampling using over-the-air computation to provide anonymity. We then theoretically analyse the joint effect of client and local dataset sampling on privacy amplification in conventional federated learning settings. Finally, we propose a compression scheme based on subtractive dithering, which provides differential privacy guarantees and is communication-efficient for private federated learning.
The research presented in this dissertation is two-fold. First, we delve into the realm of coded computation, a framework that has been extensively used to address reliability and security issues in distributed computation. Our focus is on proposing a novel coding scheme called bivariate polynomial codes that is designed to address two major problems in distributed matrix multiplication, namely, straggler mitigation and data security. Unlike prior polynomial coding schemes, our scheme efficiently utilizes all worker nodes, including stragglers, in a storage- and upload-cost-efficient manner.
Then, we shift our focus to private federated learning, which is a technique allowing for private and collaborative training of models on decentralized data sources without requiring to transfer the data to a central location. We first investigate privacy amplification via client sampling using over-the-air computation to provide anonymity. We then theoretically analyse the joint effect of client and local dataset sampling on privacy amplification in conventional federated learning settings. Finally, we propose a compression scheme based on subtractive dithering, which provides differential privacy guarantees and is communication-efficient for private federated learning.
Version
Open Access
Date Issued
2024-01
Date Awarded
2024-05
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Gunduz, Deniz
Sponsor
Türkiye Bilimsel ve Teknolojik Araştırma Kurumu
Imperial College London
Publisher Department
Electrical and Electronic Engineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)