Model Selection for Large-Scale Scientific Simulators

Project ID: 2228cd1432 (You will need this ID for your application)

UCL Lead department: Statistical Science

Project Summary:

Simulator-based models provide scientists with insights into the natural world that would be impossible to obtain through experiments or observations alone. A key challenge in this context is to answer the question: “Which simulator gives the best fit to the experimental data?’”. This model selection task is made particularly challenging because simulators are often used when an important statistical quantity is unavailable: the likelihood function. Additionally, scientific simulators are often mildly misspecified: they do not capture all the dynamics underlying the problem, and because they are influenced by nuisance parameters which are unknown and need to be estimated.

This project will aim to answer the question above through the development of novel hypothesis tests. To perform model selection in a rigorous way, one approach is to perform a relative test to understand which of two simulator-based models the experimental data is more likely to have been generated from. To deal with nuisance parameters, these tests need to be composite, meaning they can be used to compare two parametric families of models (rather than two fixed models). Finally, we would like our tests to be applicable for different categories of data including continuous, discrete, graphs or even images. To achieve all the above, our approach will be to construct relative composite tests building on existing work in the family of kernel-based tests. The project will also involve specialising the introduced methods to simulator-based particle physics data sets, where they have the potential to make a substantial impact in finding new physical signals.

Applicants should have a strong background (e.g. Master’s degree) in statistics, applied mathematics, machine learning, or a closely related field and also have research experience (e.g. research project) in one or more of: statistical inference, probabilistic machine learning, kernel methods. Strong programming skills (e.g. R, Python, Julia) are desirable.