Clustering and changepoint detection to identify suitable demand forecasting algorithms in the automotive industry
Shahnaz Abdul Hameed
Supervisors Anna-Lena Sachs, David Hofmeyr, Florian Pein
Many companies rely on accurate forecasts to make sure the right products are available at the right time. A UK automotive manufacturer faces this challenge for more than 300,000 different car parts, each with its own lifecycle. A new part may become more popular as production ramps up, while older parts may see steady or declining demand. For newer parts, only a small portion of their lifecycle has been observed, which makes it difficult to know how their demand will change in the future.
This project aims to develop statistical and computational methods that help identify when a car part is likely to move from one stage of its lifecycle to another—for example, when demand begins to grow, stabilise, or decline. To do this, the project blends two ideas: time-series clustering and changepoint detection. Time-series clustering groups together parts that show similar demand patterns, allowing us to learn from older, more complete lifecycles and use them as templates for newer parts. Changepoint detection focuses on identifying moments where behaviour shifts abruptly, such as a sudden change in demand.
By combining these approaches, the project seeks to improve early detection of lifecycle changes, especially for parts with only partially observed demand histories. Better understanding of these transitions can support more accurate forecasting, enabling automotive manufacturers to reduce unnecessary stock, improve planning, and lower both financial and environmental costs.
Identifying weak signals of change: A data-driven framework for humanitarian decision-making
Jasmine Burgess
Supervisors Gabriel Wallin, Rachel McCrea
Charity Partner British Red Cross
Humanitarian issues typically have many interconnected causes, which are noisy and hard to measure directly, with relevant indicators not systematically recorded. Information may be spread across diverse sources, including traditional data sources such as crime statistics, conflict, and socioeconomics indices, and unstructured data sources including government reports, news articles and social media posts. To utilise this unstructured data, Natural Language Processing methods will be needed to convert text into numerical data for use in statistical models such as Latent Variable Models (LVMs). LVMs provide a method for understanding high-dimensional data, by identifying a small set of unseen variables which generate the observed data and explain its variation. These latent variables are defined as functions of the observable variables; supervised rotation techniques allow interpretation of concepts such as community resilience or economic deprivation. We will develop new LVMs incorporating different types of data, including textual data, to understand the causes of complex humanitarian issues and provide early warning detection of impending crises.
The first application area will be modelling social unrest risk in Great Britain. Even though the triggers of social unrest or rioting may be hard to predict, there are long-term structural drivers of social unrest which mean that certain areas are at higher risk. For example, economic deprivation is a latent factor that will be linked to social unrest. Existing spatial LVMs exist but are not suited to real data in this context, because data is recorded at different spatial resolutions and frequencies. By developing more flexible spatial LVMs, we will strengthen understanding of the risks of social unrest, so intervention can be targeted better.
ESPRC Research Areas: Natural Language Processing, Statistics and Applied Probability
Disentangling transience and population growth using robust design capture-recapture
Malcolm Connolly
Supervisors Rachel McCrea, Carolina Euan
Academic Partner University College Dublin
This project is in collaboration with one of the CDT's strategic partners, University College Dublin, and is motivated by data provided by the British Trust for Ornithology (BTO). The project will develop new statistical models for avian population monitoring data. BTO collects capture-recapture data from volunteers who use mist nets to harmlessly capture and ring birds, as part of a long running monitoring programme of common UK resident breeding birds, called the Constant Effort Scheme (CES). Avian population monitoring provides valuable evidence of the general health of UK wildlife, and is informative to conservation efforts, as these populations are affected by the pressures of climate change, habitat loss and pesticide use. Moreover, capture-recapture data are ecologically important because they enable estimation of key demographic parameters such as abundance and survival. This project will develop models for the CES data which account for spatial correlation across the multiple CES sites which span the British Isles. Further, the capture-recapture data are currently aggregated to an annual level, and so a further aim of the project is to develop models which utilise the information from within-season recaptures to provide insight into transience and population growth.
Identifying epidemic dynamics from data using machine learning
Cassandra Durr
Supervisor Lloyd Chapman, Chris Jewell
Academic Partner Oslo University
Epidemic dynamics can be characterised by the rates at which individuals move between different health states, such as rates of infection, recovery, hospitalisation, and death. The numbers of people transitioning between different states per unit time are known as the fluxes. In traditional epidemic modelling, the mathematical form of these fluxes is typically assumed, which can prevent models from accurately capturing complex real-world dynamics.
This project aims to develop novel methods for learning the mathematical equations underlying the fluxes directly from observational data. By learning these dynamics directly from data, we avoid imposing restrictive assumptions and better reflect reality. Learning mathematical expressions for the fluxes also ensures interpretability, which is key for policymakers who rely on epidemic models to make informed public health decisions.
This project will develop system identification methods that address the challenges associated with epidemic data, such as noisy and partially observed data. While the methods will be developed for epidemic systems, they will be applicable to any dynamic system where entities flow into and out of defined states, offering a generalisable approach to dynamical system identification.
Novel methods for real-time multivariate anomaly detection in data streams
Harry Ellingham
Supervisors Idris Eckley, Florian Pein
Industry Partner Morgan Stanley
My research focuses on developing statistical methods to spot unusual or unexpected behaviour in real world data. We call these unusual observations anomalies, and the task of detecting these has received significant attention in recent years. However, real data is often messy: it can behave unpredictably, change over time, or show patterns and dependencies between time series that are hard to capture with simple tools. Because of this, methods that try to detect unusual behaviour can easily be confused, and they can raise this complicated background variation as anomalous, or lose the ability to detect any real anomalous behaviour.
Our goal is to create tools that continue to work even when the data is noisy, complicated, or constantly changing. Improving these methods would help people working in areas such as finance, environmental science, and other fields that rely on large streams of data to make decisions.
Uncertainty quantification when combining forecasts and simulation for long-term problems
Rebekah Fearnhead
Supervisors Luke Rhodes-Leader, Ivan Svetunkov
Industrial Partner UK Cabinet Office
One of the areas that the Joint Data Analysis Centre (JDAC) is interested in is how the needs of the healthcare system will change in the future. This problem can be split into two main areas: forecasting future demand, and simulation the operational aspects of the system.
When looking at demand, one of the ways of doing this is using conventional forecasting techniques to predict how admission numbers will change based on historical data of past admission numbers. One way to improve these forecasts is to include external information, for example demographic information or the presence of Covid. Being able to forecast population in different areas of the UK and internal migration of different age groups should also help to improve the accuracy of these forecasts further.
When looking at the operational aspect, a main area of interest is the effect that different government policies can have on the performance of hospitals. This can be done by simulating the operation and performance of hospitals under different scenarios. One way to investigate this is combining the demand forecasting with simulation models, for example for hospital flow.
In both forecasting and simulation, it is important to quantify the uncertainty in the results. This is important for JDAC as if they want to use these results to inform government policies, the uncertainty can tell them how certain we are that the forecasted scenarios will happen. One main challenge especially when using results from a forecast in a simulation model is that the amount of uncertainty in the input can have an effect on the uncertainty in the results of the simulation.
Stochastic programming approaches for the storage location assignment problem
Mark Holcroft
Supervisors Jamie Fairbrother, Luke Rhodes-Leader
Industrial Partner Tesco
A key process in warehouse operations is order picking, whereby Stock-Keeping Units (SKUs) are picked from their storage slots to satisfy incoming orders. Factors to be considered here include placing highly demanded products near to the input/output point, and placing products often ordered together near to each other. This is generally the most costly process in many warehouse settings, making efficiency a priority. The problem of assigning products to storage locations is known as the Storage Location Assignment Problem (SLAP), and it is a difficult problem as we must consider all possible assignments, as well as all routes pickers can take between storage locations.
Most existing models in this area are sequential, involving first assigning SKUs to slots, and then choosing picker routes. Whilst this makes solving easier, more recent works have integrated the two stages to solve them together, achieving much better solutions. However, the cost of this approach is that it involves much greater computational expense, and scaling to larger instances is a challenge which has yet to be truly overcome.
During my PhD, our focus will be in working with our industry partner Tesco to develop new models for solving the SLAP. To reflect the random nature of orders in such problems, we will use stochastic optimisation, which has not yet been applied in this area.
The main aim of our project is to increase the warehouse sizes for which we are able to find solutions. One avenue for doing this is using input reduction techniques, whereby we efficiently choose orders to reduce the problem size whilst maintaining good solution quality. We also aim to include problem-specific operational constraints, such as one-way systems, to further simplify the problem. Using these alongside intelligent model formulation, we hope to produce models with genuine practical usage for implementation into real-world contexts.
Optimisation: Deterministic, stochastic and in-between
Jimmy Lin
Supervisor Jamie Fairbrother, Adam Letchford
Industrial Partner Shell
Shell regularly faces optimisation problems in which some of the parameters are uncertain. This includes, for example, problems involving the routing and scheduling of ships, and problems concerned with planning portfolios of renewable and non-renewable energy sources. At present, they typically assume that all parameters take their expected values, and solve the problems using deterministic approaches (such as linear or mixed-integer linear programming). They are now looking for alternative approaches, which explicitly take uncertainty into account.
A key limitation of standard approaches to optimisation under uncertainty, such as stochastic programming and robust optimisation, is that they are often too time-consuming to be practical. The goal of this project is to explore ways to reduce the computational burden of these methods by using scenario generation methods and/or metaheuristic algorithms to obtain fast, though not exact, solutions to stochastic programs.
Consumer healthcare supply chain visibility and multi-echelon inventory optimisation
Niharika Peddinenikalva
Supervisor Anna-Lena Sachs, Rob Shone
Industrial Partner Haleon
This project is in partnership with Haleon UK plc - one of the leading consumer healthcare companies globally. With a complex global supply chain, Haleon supplies a range of consumer healthcare products to supermarkets, pharmacies, retail shops, etc. Multi-echelon inventory optimisation controls the flow of products through such supply chains, which is crucial to ensure that products reach customers on time and efficiently.
While inventory optimisation is useful in any industry, the pharmaceutical and healthcare industries have a unique combination of features and requirements. Some of these include batch production and distribution, high service levels with mitigating measures such as emergency expediting orders and varying regulatory measures.
The main aim of this project is to use stochastic dynamic optimisation methods to develop suitable solutions (e.g. ordering policies) for large-scale supply chains with unique features relevant to the healthcare industry. The scope of the project includes the design and development of algorithms which can solve multi-echelon optimisation problems which have not been widely studied in the healthcare context.
EPSRC Research Area: Operational Research
The Time Window Assignment Vehicle Routing Problem
Billie-Jo Powers
Supervisor Thu Dang, Dong Li
Academic partner Northwestern University
Transportation plays a vital role across social and economic activities today and is a key contributor to the UK economy. As a result, planning the best routes for delivery vehicles is very important. However, this is not easy as there are many restrictions to consider, such as traffic, time limits, fuel use and customer locations. The Vehicle Routing Problem (VRP) is a well-studied combinatorial optimization problem addressing this challenge.
In many contexts, such as deliveries from warehouses to shops, deliveries are made on a regular basis, and a time window must be assigned to each shop well in advance, even though the demand is unknown and may vary. Delivery schedules must follow the set time window for each shop. The Time Window Assignment Vehicle Routing Problem (TWAVRP) is a variant of the VRP addressing this situation. Since the time window allocation must occur before demand is known, this is a two-stage stochastic problem. The TWAVRP is a strongly NP hard problem. This project aims to develop effective solution methods for the TWAVRP.
Modelling complex dependency structures in count time series data
Roberto Vasquez Martinez
Supervisor Israel Martinez Hernandez, Emma Eatoe
Academic partner University College Dublin
In many scientific fields, such as medicine, sociology, finance, epidemiology, and environmental sciences, it is natural to have records of counts over an observed time horizon. Examples of such data include the daily number of transactions in a stock market, the monthly number of viral diseases, or utilisation counts for health-care services. This type of data is known as a count time series. Furthermore, in most of these data analysis studies, it is common to have information on several variables that describe the phenomena. For instance, in finance, it is common to analyse transaction counts across different assets simultaneously, and in epidemiology, in addition to the record number of cases, information on other covariates, such as demographics or comorbidities of the studied population. In statistics, we understand the study of these two important aspects together as multivariate count time series modelling.
Given the current nature of the data (counts), new statistical models are required. Current statistical methods in multivariate count time series present several limitations that need to be addressed. For example, there is a scarcity of count time series methods for modelling the dependence structure among the variables in the analysis. In addition, current methods are unsuitable for high-dimensional applications, i.e., where the number of variables involved is of the order of hundreds or thousands.
The project aims to develop novel models for multivariate count time series, combining rigorous theoretical development, computational and practical advances, and motivation from real applications.
Novel anomaly detection methods for assuring mobile usage data
Fiona Wilson
Supervisor David Hofmeyr, Idris Eckley
Industrial Partner Tesco Mobile
Mobile phone networks have strict terms and conditions which outline what customers can use their network for. However, a minority of users misuse their SIM cards for example by sending large numbers of text messages (spamming), having a high volume of phone calls or making a lot of contact with international or premium numbers. Often spam messages are associated with scams and these cost consumers millions of pounds each year. For this reason, and to protect the reputation of mobile network companies, we wish to develop methods to quickly identify spamming on the network so that the mobile network provider can take the necessary actions to protect customers.
This project is in partnership with Tesco Mobile to develop new methods of showing this network misuse by looking for anomalous patterns in their data. While some network misuse may be obvious, such malicious users are becoming more aware of simple detection methods and so more sophisticated ones are needed. These new anomaly detection methods aim to detect unusual patterns both within and between customer usage profiles. As part of this, we also want to characteries normal network usage to capture different typical usage profiles.
EPSRC Area: Statistics