Experiments · 01 · Probabilistic ML
A model that knows what it doesn't know.
A Gaussian process does not just predict; it quantifies its own uncertainty at every point. That is why I reach for GPs on industrial problems where data is expensive: every measurement costs a batch, a lab hour, or an intervention on a running plant. Click anywhere on the chart to add an observation and watch the posterior re-fit, live.
What you're looking at
The regression
The copper line is the posterior mean; the shaded region is ±2 standard deviations. Both come from conditioning a multivariate Gaussian (RBF kernel) on your clicks, computed exactly in the page by Cholesky decomposition, with no libraries. Tighten the lengthscale and the model stops generalising; raise the noise and it stops trusting your points.
The experiment design
"Suggest next experiment" marks where the posterior uncertainty is highest, which is the core move of Bayesian optimisation. In industrial R&D I have run exactly this loop: a first design-of-experiments pass for the main effects, then Bayesian-optimised iterations to fill in the gaps, so that every extra experiment teaches the model as much as possible. A model that can flag its own blind spots is also a model you can responsibly put in front of a live process.
At production scale
The demo above is exact GP regression: invert the kernel matrix and the posterior falls out in closed form. That is the right tool for a screenful of points and entirely the wrong one for years of hourly plant data, where the cubic cost of the inversion becomes the bottleneck. When I model things like the specific energy consumption of a reformer, I use sparse variational GPs in GPyTorch instead: the dataset is summarised by a few hundred inducing points, and the model is fitted by stochastic optimisation on a GPU.
Most of the craft sits in the decisions around the model rather than in the model itself. I have run kernel experiments with sums and products of Matérn, linear and RBF components to separate smooth trends from rougher behaviour, and more often than not a well-tuned RBF kernel matches them, at which point the simple kernel wins. The training objective deserves more attention than it usually gets: a variational ELBO generalises broadly across the whole operating envelope, but when what you actually care about is accuracy around normal operation, a predictive log-likelihood objective fits that region noticeably better. Even the inducing points reward some care. Rather than scattering them at random, I stratify them across the extremes, the quartiles and their neighbours, so the sparse model keeps the structure that matters.
And before any of that, the unglamorous work that decides whether the model is any good at all: converting every stream to consistent units, filtering out the periods where an instrument or an upstream calculation is known to be wrong, and resisting the temptation to let a flexible model explain away artefacts that a data fix should remove. A Gaussian process will happily model your bugs, with beautifully calibrated uncertainty.