Gaussian process regression, live

The demo above is exact GP regression: invert the kernel matrix and the posterior falls out in closed form. That is the right tool for a screenful of points and entirely the wrong one for years of hourly plant data, where the cubic cost of the inversion becomes the bottleneck. When I model things like the specific energy consumption of a reformer, I use sparse variational GPs in GPyTorch instead: the dataset is summarised by a few hundred inducing points, and the model is fitted by stochastic optimisation on a GPU.

Most of the craft sits in the decisions around the model rather than in the model itself. I have run kernel experiments with sums and products of Matérn, linear and RBF components to separate smooth trends from rougher behaviour, and more often than not a well-tuned RBF kernel matches them, at which point the simple kernel wins. The training objective deserves more attention than it usually gets: a variational ELBO generalises broadly across the whole operating envelope, but when what you actually care about is accuracy around normal operation, a predictive log-likelihood objective fits that region noticeably better. Even the inducing points reward some care. Rather than scattering them at random, I stratify them across the extremes, the quartiles and their neighbours, so the sparse model keeps the structure that matters.

And before any of that, the unglamorous work that decides whether the model is any good at all: converting every stream to consistent units, filtering out the periods where an instrument or an upstream calculation is known to be wrong, and resisting the temptation to let a flexible model explain away artefacts that a data fix should remove. A Gaussian process will happily model your bugs, with beautifully calibrated uncertainty.

A model that knows what it doesn't know.

What you're looking at

The regression

The experiment design

At production scale