82
IRUS TotalDownloads
Altmetric
Higher order tensor decompositions for machine intelligence
File | Description | Size | Format | |
---|---|---|---|---|
Calvi-GG-2021-PhD-Thesis.pdf | Thesis | 27.06 MB | Adobe PDF | View/Open |
Title: | Higher order tensor decompositions for machine intelligence |
Authors: | Calvi, Giuseppe Giovanni |
Item Type: | Thesis or dissertation |
Abstract: | Recent years have witnessed a surge in the routine creation of large amounts of data. While traditional tools of linear algebra are still adequate for their analysis when the considered datasets are not particularly voluminous or complex, the unprecedented changes associated with Big Data have also highlighted the limitations of flat-view matrix models. This calls for the development of sophisticated tensor and multi-linear algebra algorithms that are naturally able to efficiently manipulate multi-dimensional data. The overarching theme of this thesis is hence to develop novel, interpretable tensor algorithms and to show their practical utility in signal processing and machine learning applications. In the first part of the thesis, fundamental and advanced concepts commonly expressed with mathematical formulae are elucidated through graphical representations via Tensor Networks (TNs), which allow to visualize tensor equations as an interaction of nodes and edges. This has made it possible to provide new perspectives on core principles, as well as paving avenues for novel results, such as a method to perform tensor contractions through graphical representations of Tensor-Trains (TTs). Then, methods to ease the computational burden on the renowned Canonical Polyadic Decomposition (CPD) are presented. Firstly, it is shown how a prior TT-decomposition can reduce the CPD computational cost with respect to the tensor order from exponential to linear. Secondly, a lower bound on the tensor rank, R, is introduced along with conditions under which it is attained, allowing for an increased efficiency of the CPD computation. In the second part of the thesis a framework for TN summation is introduced through an analogy with feature fusion. A practical application on the ETH-80 dataset has shown that the proposed framework preserves feature locality. Through a process of deflation, it is also demonstrated how TN summation can be employed for eigenvalue extraction of large scale matrices. Next, a method to allow the use of kernels in the Support Tensor Machine (STM), the tensor extension of the well-known SVM, is developed. This has made it possible to successfully apply it to a problem of financial forecasting. In the third and final part of this thesis, tensors are applied to deep learning. The introduction of the Tucker Tensor Layer (TTL) as an alternative to the fully connected matrices in Neural Networks (NNs), has achieved a more than 60-fold compression on NNs applied to the classification of the MNIST and Fashion-MNIST datasets, at a small sacrifice in accuracy. Moreover, the novel analytical derivation of the tensor-valued back-propagation algorithm allows to gain insights into the training process, thus mitigating the notorious “black-box” issue inherent to NNs. Practical benefits of this result have been shown on the CIFAR-10 dataset. Finally, Tensor-Train Recurrent Neural Networks (TT-RNNs) have been applied to financial forecasting, further highlighting the compression and interpretability properties of tensors. |
Content Version: | Open Access |
Issue Date: | Nov-2020 |
Date Awarded: | Feb-2021 |
URI: | http://hdl.handle.net/10044/1/87839 |
DOI: | https://doi.org/10.25560/87839 |
Copyright Statement: | Creative Commons Attribution Non-Commercial NoDerivatives Licence |
Supervisor: | Mandic, Danilo |
Sponsor/Funder: | Engineering and Physical Sciences Research Council |
Funder's Grant Number: | 1895651 |
Department: | Electrical and Electronic Engineering |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Electrical and Electronic Engineering PhD theses |
This item is licensed under a Creative Commons License