EIGENDATA
Project ID: 2531ad1514
(You will need this ID for your application)
Research Theme: Physical Sciences
UCL Lead department: Computer Science
Lead Supervisor: Paolo Barucca
Project Summary:
Digitalisation of large scientific datasets across disciplines, ranging from protein interactions to metabolic networks, from brain electromagnetic signals to climate time series, from financial markets to marketing data, has led to an increasing need for tools to extract signals from multiple noisy sources. This can be recognised also from the increasing demand for specialised data scientists that apply methodologies from statistics and machine learning for identifying consistent patterns of behaviour in a vast range of heterogeneous and noisy complex systems.
This project aims at providing new theoretical tools and intuitive methodologies for separating the signal from noise and learning static and dynamic models from complex data structures - labeled and unlabeled data, network-structured data, time-series data. These methodologies will be based on the study and development of new theoretical results regarding eigenvectors statistics in random matrix theory. The advances in the analytical and numerical evaluation of benchmark random patterns in the eigenvectors and eigenvalues - i.e. linear algebra quantities found ubiquitously across methods and systems - will translate into methodologies to identify and interpret signals from the matrix representations of complex systems.
The project will leverage the long-standing theoretical results in random matrix theory which mainly focus on eigenvalue statistics and the recent theoretical advances on eigenvector statistics targeting applications in complexity theory and data science.
This interdisciplinary research combines the development of theoretical results in random matrix theory with the necessary assumptions for the analysis of empirical matrices in different contexts, in complex network analysis for spectral clustering and stability analysis and in statistical learning for model inference. The project will impact the level of understanding and interpretability that is expected when applying statistical and machine learning to the study of complex systems, providing intuitive and theoretically-motivated methodologies of interest for a vast community of researchers in academia and in industry.