Norwegian version

Public defence: Carla Schenker

Carla Schenker will defend her thesis “A Flexible Framework for Data Fusion Based on Coupled Matrix and Tensor Factorizations for Interpretable Pattern Discovery” for the PhD program in Engineering Science.

This event will also be available via live stream. 

Trial Lecture

The trial lecture starts at 10:00. Please do not enter the room after the lecture has begun. 

Title: «Uniqueness of coupled matrix and tensor models with (partially) shared factors and constraints: an overview of known results».

Public defence

The candidate will defend her thesis at 12:00. Please do not enter the room after the defence has begun.

Title of the thesis: “A Flexible Framework for Data Fusion Based on Coupled Matrix and Tensor Factorizations for Interpretable Pattern Discovery”.

Ordinary opponents

Leader of the evaluation committee / Chair of the committee

Hugo Lewi Hammer, Professor, Ph.D., Department of Computer Science, Faculty of Technology, Art and Design, OsloMet, Oslo, Norway.

Leader of the public defence

Anis Yazidi, Professor, Department of Computer Science, Innovation, Digital Transformation and Sustainability, Faculty of Technology, Art and Design, OsloMet, Oslo.

Supervisors

Abstract

Data fusion is the task of jointly analyzing multiple interrelated data sets such that they can interact and inform each other. Data fusion is an indispensable data analysis approach in various application areas like medicine, chemometrics or remote sensing, where information about the same phenomenon is acquired from multiple modalities, e.g. multiple sensing technologies. While none of the modalities alone can provide a complete picture of the phenomenon, data from different modalities can complement each other.  For instance, different imaging techniques like electroencephalography (EEG) and functional magnetic resonance (fMRI) provide complementary temporal and spatial resolutions of brain activity.

Data can often be represented in the form of matrices and higher-order tensors, i.e. multiway arrays. EEG imaging data, for example, can be organized as a three-way tensor with modes subjects, time and electrodes. Coupled matrix and tensor factorizations which model each data set as a sum of low-rank components, are an effective approach for the joint analysis of such data sets and can be used to extract interpretable latent patterns that give insight into the underlying processes generating the data.

However, data sets obtained from multiple sources are often heterogeneous which poses many challenges in data fusion. For instance, the data sets can consist of different data types, can have different sizes and dimensions, different noise characteristics, can be recorded with different sampling rates or can be both of dynamic and static nature. Furthermore, data sets can have both shared and unshared components. To account for the different characteristics of the data sets, coupled matrix and tensor factorization models require to incorporate different tensor decomposition models, different loss functions and diverse types of coupling structures between data sets. In addition, various constraints and regularization are regularly needed to promote identifiability and interpretability of the extracted patterns.

In this thesis, first, a coupled matrix and tensor factorization model that has the potential to automatically reveal shared and unshared components is applied to a multi-modal neuroimaging data set and potential biomarkers of a psychiatric disorder are extracted. We present a systematic study of this coupled matrix and tensor factorization model for biomarker discovery, demonstrating both the effectiveness and the limitations of the model.

In the main part of the thesis, a flexible algorithmic framework for constrained linearly coupled matrix and tensor factorizations is proposed. The framework supports a wide range of important constraints, regularizations and loss functions as well as linear coupling relations in a seamless way. The framework facilitates the use of two different tensor decomposition models, namely the popular CANDECOMP/PARAFAC (CP) model as well as the PARAFAC2 model. Furthermore, we introduce a new algorithm for fitting PARAFAC2 models that makes it possible to flexibly impose various constraints on all modes of PARAFAC2.

We show through experiments on synthetic data that our proposed approach can accurately extract the true underlying components in a variety of settings and that it achieves competitive performance and in some cases even superior performance compared to state-of-the-art methods in terms of computational efficiency. Furthermore, we demonstrate the promise of PARAFAC2-based coupled matrix and tensor factorization models for the joint analysis of dynamic and static data sets building on the ability of the PARAFAC2 model to account for either evolving patterns or individual time profiles in dynamic data. Experiments on real data from chemometrics and remote sensing show the versatility and applicability of the proposed framework employing various constraints and linear coupling structures.