145
IRUS TotalDownloads
Altmetric
Privacy-preserving machine learning system at the edge
File | Description | Size | Format | |
---|---|---|---|---|
Mo-F-2022-PhD-Thesis.pdf | Thesis | 3.22 MB | Adobe PDF | View/Open |
Title: | Privacy-preserving machine learning system at the edge |
Authors: | Mo, Fan |
Item Type: | Thesis or dissertation |
Abstract: | Data privacy in machine learning has become an urgent problem to be solved, along with machine learning's rapid development and the large attack surface being explored. Pre-trained deep neural networks are increasingly deployed in smartphones and other edge devices for a variety of applications, leading to potential disclosures of private information. In collaborative learning, participants keep private data locally and communicate deep neural networks updated on their local data, but still, the private information encoded in the networks' gradients can be explored by adversaries. This dissertation aims to perform dedicated investigations on privacy leakage from neural networks and to propose privacy-preserving machine learning systems for edge devices. Firstly, the systematization of knowledge is conducted to identify the key challenges and existing/adaptable solutions. Then a framework is proposed to measure the amount of sensitive information memorized in each layer's weights of a neural network based on the generalization error. Results show that, when considered individually, the last layers encode a larger amount of information from the training data compared to the first layers. To protect such sensitive information in weights, DarkneTZ is proposed as a framework that uses an edge device's Trusted Execution Environment (TEE) in conjunction with model partitioning to limit the attack surface against neural networks. The performance of DarkneTZ is evaluated, including CPU execution time, memory usage, and accurate power consumption, using two small and six large image classification models. Due to the limited memory of the edge device's TEE, model layers are partitioned into more sensitive layers (to be executed inside the device TEE), and a set of layers to be executed in the untrusted part of the operating system. Results show that even if a single layer is hidden, one can provide reliable model privacy and defend against state of art membership inference attacks, with only a 3% performance overhead. This thesis further strengthens investigations from neural network weights (in on-device machine learning deployment) to gradients (in collaborative learning). An information-theoretical framework is proposed, by adapting usable information theory and considering the attack outcome as a probability measure, to quantify private information leakage from network gradients. The private original information and latent information are localized in a layer-wise manner. After that, this work performs sensitivity analysis over the gradients \wrt~private information to further explore the underlying cause of information leakage. Numerical evaluations are conducted on six benchmark datasets and four well-known networks and further measure the impact of training hyper-parameters and defense mechanisms. Last but not least, to limit the privacy leakages in gradients, I propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems. TEEs are utilized on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries. This work leverages greedy layer-wise training to train each model's layer inside the trusted area until its convergence. The performance evaluation of the implementation shows that PPFL significantly improves privacy by defending against data reconstruction, property inference, and membership inference attacks while incurring small communication overhead and client-side system overheads. This thesis offers a better understanding of the sources of private information in machine learning and provides frameworks to fully guarantee privacy and achieve comparable ML model utility and system overhead with regular machine learning framework. |
Content Version: | Open Access |
Issue Date: | May-2022 |
Date Awarded: | Oct-2022 |
URI: | http://hdl.handle.net/10044/1/100362 |
DOI: | https://doi.org/10.25560/100362 |
Copyright Statement: | Creative Commons Attribution NonCommercial Licence |
Supervisor: | Haddadi, Hamed Boyle, David |
Sponsor/Funder: | CSC Scholarships |
Funder's Grant Number: | 201908060168 |
Department: | Dyson School of Design Engineering |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Design Engineering PhD theses |
This item is licensed under a Creative Commons License