Defining and Measuring Robustness in Bayesian Machine Learning
Project ID: 2228cd1424 (You will need this ID for your application)
Under Offer
Research Theme: Mathematical Sciences
UCL Lead department: Statistical Science
Lead Supervisor: Francois-Xavier Briol
Project Summary:
Robustness refers to the ability of a model to perform well on unseen data, or data that is different from the data it was trained on. It is an ever-evolving challenge for practitioners of statistical and machine learning methods, who need to deal with large, complex, and un-curated data sets and build tools that are reliable in uncontrolled environments. It is also particularly important in safety critical applications, such as medical diagnosis, self-driving cars, or the criminal justice system, where a lack of robustness can have a severe impact.
In this project, we will focus on the robustness of Bayesian machine learning. In Bayesian inference, we start with a prior belief about a quantity of interest, and then update this belief based on our model of the world and new evidence in the form of data. This allows us to formally describe our uncertainty and make reliable predictions. However, a crucial assumption is that the model can truly represent the data-generating process. When this assumption is violated, the model is called misspecified, and our predictions become unreliable. To remedy this issue, a wide range of approaches have been proposed to make Bayesian methods more robust.
Nevertheless, two fundamental questions are often brushed over and remain: “How should we mathematically define Bayesian robustness?”, and “How should we measure robustness in practice?”. Answering these questions is key to providing a rigorous comparison of existing so-called “robust” Bayesian methods, and to help practitioners chose which algorithms to use for a given problem.
This PhD project will aim to answer these questions through rigorous mathematical analysis and computational benchmarking. A strong candidate will therefore have a solid mathematical or statistical background with some good programming experience (preferably in Python) and a demonstrable interest in machine learning.