GPFlow multiple independent realizations of same GP, irregular sampling times/lengths

In GPflow I have multiple time series and the sampling times are not aligned across time series, and the time series may have different length (longitudinal data). I assume that they are independent realizations from the same GP. What is the right way to handle this with svgp, and more generally with GPflow? Do i need to use coregionalization? The coregionalization notebook assumed correlated trajectories, while I want shared mean/kernel but independent.

Solution

Yes, the Coregion kernel implemented in GPflow is what you can use for your problem.

Let's set up some data from the generative model you describe, with different lengths for the timeseries:

import numpy as np
import gpflow
import matplotlib.pyplot as plt

Ns = [80, 90, 100]  # number of observations for three different realizations
Xs = [np.random.uniform(0, 10, size=N) for N in Ns]  # observation locations

# three different draws from the same GP:
k = gpflow.kernels.Matern52(variance=2.0, lengthscales=0.5)  # kernel

Ks = [k(X[:, None]) for X in Xs]
Ls = [np.linalg.cholesky(K) for K in Ks]
vs = [np.random.randn(N, 1) for N in Ns]
fs = [(L @ v).squeeze(axis=-1) for L, v in zip(Ls, vs)]

To actually set up the training data for the gpflow GP model:

# output indicator for the observations: which timeseries is this?
os = [o * np.ones(N) for o, N in enumerate(Ns)]  # [0 ... 0, 1 ... 1, 2 ... 2]

# now assemble the three timeseries in single data set:
allX = np.concatenate(Xs)
allo = np.concatenate(os)
allf = np.concatenate(fs)
X = np.c_[allX, allo]
Y = allf[:, None]
assert X.shape == (sum(Ns), 2)
assert Y.shape == (sum(Ns), 1)

# now let's set up a copy of the original kernel:
k2 = gpflow.kernels.Matern52(active_dims=[0])  # the same as k above, but with different hyperparameters

# and a Coregionalization kernel that effectively says they are all independent:
kc = gpflow.kernels.Coregion(output_dim=len(Ns), rank=1, active_dims=[1])
kc.W.assign(np.zeros(kc.W.shape))
kc.kappa.assign(np.ones(kc.kappa.shape))
gpflow.set_trainable(kc, False)  # we want W and kappa fixed

The Coregion kernel defines a covariance matrix B = W Wᵀ + diag(kappa), so by setting W=0 we prescribe zero correlations (independent realizations) and kappa=1 (actually the default) ensures that the variance hyperparameter of the copy of the original kernel remains interpretable.

Now construct the actual model and optimize hyperparameters:

k2c = k2 * kc

m = gpflow.models.GPR((X, Y), k2c, noise_variance=1e-5)

opt = gpflow.optimizers.Scipy()
opt.minimize(m.training_loss, m.trainable_variables, compile=False)

which recovers the initial variance and lengthscale hyperparameters pretty well.

If you want to predict, you have to provide the extra "output" column in the Xnew argument to m.predict_f(), e.g. as follows:

Xtest = np.linspace(0, 10, 100)
Xtest_augmented = np.c_[Xtest, np.zeros_like(Xtest)]
f_mean, f_var = m.predict_f(Xtest_augmented)

(whether you set the output column to 0, 1, or 2 does not matter, as we set them all to be the same with our choice of W and kappa).

If your input was more than one-dimensional, you could set active_dims=list(range(X.shape[1] - 1)) for the first kernel(s) and active_dims=[X.shape[1]-1] for the Coregion kernel.