Latent factor representations of dynamic networks with applications in cyber-security
File(s)
Author(s)
Sanna Passino, Francesco
Type
Thesis
Abstract
Dynamic networks frequently arise in nature, representing, for example, neuronal connectivity in the brain, internet connections between computers, and human interactions within social networks. Motivated by specific characteristics of computer networks, this thesis gives novel contributions to three important branches of statistical analysis of dynamic graphs: modelling of connectivity events, graph clustering, and link prediction. Nodes and edges within the network will be assumed to have latent, unobserved characteristics, estimated using statistical latent factor models.
The first part of this work proposes edge-based models for individual connection events. A model for dynamic network evolution based on the Pitman-Yor process is proposed, which is used for network-wide anomaly detection. Furthermore, a model for classification of periodic arrivals in event time data is proposed, to separate human and automated activity within individual edges. Correct classification of these two types of activity is fundamental for effective intrusion detection.
In its second part, this work proposes novel methodologies for graph clustering. Finding nodes with similar connectivity patterns is important for reliable network monitoring. A model for simultaneous estimation of the latent dimension and number of communities in spectral clustering is proposed, under a generalised random dot product graph interpretation of the stochastic blockmodel. The model is then extended to allow for heterogeneous within-community degree distributions, proposing a novel spectral clustering algorithm under the degree corrected stochastic blockmodel.
Finally, in the third part of this thesis, link prediction methods are discussed. The ability to correctly associate anomaly scores with the connections in a network is crucial for the cyber-defence of an organisation. In this work, it is demonstrated that random dot product graphs are a powerful and scalable tool for link prediction in large graph. Furthermore, an extension of the popular Poisson matrix factorisation model is proposed, which correctly models binary matrices and allows the effects of nodal covariates to be estimated. Scalable inferential techniques are also discussed.
Overall, this work gives contributions towards a unified model for cyber-security applications, in a Bayesian framework, with the ultimate aim to dynamically predict the connectivity of IP addresses, or users and hosts, in a computer network.
The first part of this work proposes edge-based models for individual connection events. A model for dynamic network evolution based on the Pitman-Yor process is proposed, which is used for network-wide anomaly detection. Furthermore, a model for classification of periodic arrivals in event time data is proposed, to separate human and automated activity within individual edges. Correct classification of these two types of activity is fundamental for effective intrusion detection.
In its second part, this work proposes novel methodologies for graph clustering. Finding nodes with similar connectivity patterns is important for reliable network monitoring. A model for simultaneous estimation of the latent dimension and number of communities in spectral clustering is proposed, under a generalised random dot product graph interpretation of the stochastic blockmodel. The model is then extended to allow for heterogeneous within-community degree distributions, proposing a novel spectral clustering algorithm under the degree corrected stochastic blockmodel.
Finally, in the third part of this thesis, link prediction methods are discussed. The ability to correctly associate anomaly scores with the connections in a network is crucial for the cyber-defence of an organisation. In this work, it is demonstrated that random dot product graphs are a powerful and scalable tool for link prediction in large graph. Furthermore, an extension of the popular Poisson matrix factorisation model is proposed, which correctly models binary matrices and allows the effects of nodal covariates to be estimated. Scalable inferential techniques are also discussed.
Overall, this work gives contributions towards a unified model for cyber-security applications, in a Bayesian framework, with the ultimate aim to dynamically predict the connectivity of IP addresses, or users and hosts, in a computer network.
Version
Open Access
Date Issued
2020-10
Date Awarded
2021-02
Copyright Statement
Creative Commons Attribution - NonCommercial 4.0
International Licence
International Licence
License URL
Advisor
Heard, Nicholas
Gandy, Axel
Publisher Department
Mathematics
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)