Search code examples
scikit-learnstatisticsregressiongaussiangpflow

Gaussian process regressions estimates of confidence intervals


This may be an odd question, but when Gaussian process regressions see a bunch of noisy data without much of a signal, what do they do? Below I take a bunch of noisy data and run two different implementations of GPR and they both produce super tiny confidence intervals. Is there a good reason as to why this is the case? My intuition is telling me the confidence intervals should be larger. Are GPR's really that confident in their estimate of the mean? Additionally, is there an appropriate way to pad the variance estimates aside from adding a white noise kernel?

import numpy as np
import gpflow as gpflow
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel, Matern, RBF

## some data
X1 = np.array([ 2.,  2.,  3.,  4.,  5.,  5.,  5.,  6.,  6.,  6.,  7.,  7.,  7.,
        8.,  8.,  8.,  8.,  8.,  9.,  9.,  9.,  9., 10., 11., 11., 12.,
       12., 12., 13., 13., 14., 14., 15., 15., 15., 16.])

Y1  = np.array([-0.70007257, -0.69388464, -0.63062014, -0.72834303, -0.67526754,
        1.00259286, -0.96141351, -0.08295884,  1.0727982 , -2.29816347,
       -0.61594418,  1.13696593, -2.18716473, -0.35037363,  1.96273672,
        1.31621059, -1.88566144,  1.80466116, -0.79665828,  2.40720146,
        1.83116473, -1.67224082, -0.96766061, -0.67430408,  1.79624005,
       -1.41192248,  1.01754167,  0.37327703, -1.1195072 ,  0.71855107,
       -1.16906878,  0.99336417,  1.12563488, -0.36836713,  0.12574823,
        0.23294988])

## gpflow
model = gpflow.models.GPR(X=X1[:,None],
                         Y= Y1[:,None], kern=gpflow.kernels.RBF(1))

gpflow.train.ScipyOptimizer().minimize(model)

## scikit
kernel = RBF()
gpr = GaussianProcessRegressor(kernel=kernel,
        random_state=0).fit(X= X1[:,None], y= Y1[:, None])

# plot function
def plot(m,  gpflow =True):
    plt.figure(figsize=(8, 4))
    xtest = np.linspace(np.min(X1),np.max(X1), 20)[:,None]
    line, = plt.plot(X1, Y1, 'x', mew=2)

    if gpflow:
        mu, var = m.predict_f(np.hstack((xtest, np.zeros_like(xtest))))

        plt.plot(xtest, mu, color="green", lw=2, label="GPflow")
        plt.fill_between(xtest[:, 0], 
                         (mu - 2*np.sqrt(var))[:, 0], 
                         (mu + 2*np.sqrt(var))[:, 0], 
                         color="lightgreen", alpha=0.4)
    else:   
        mu, se = m.predict(xtest, return_std=True)

        plt.plot(xtest, mu, color="red", lw=2, label="Scipy")
        plt.fill_between(xtest[:, 0], 
                         (mu - 2*se)[:, 0], 
                         (mu + 2*se)[:, 0], 
                         color="red", alpha=0.4)


    plt.legend()

Gpflow estimates

[Scipy estimates[2]


Solution

  • It's often helpful to look at the actual optimized values of your model hyperparameters - in this case noise variance, kernel variance and kernel lengthscale:

                                 class           ...                             value
    GPR/kern/lengthscales    Parameter           ...                3.7149993613788737
    GPR/kern/variance        Parameter           ...            2.0572871322469534e-06
    GPR/likelihood/variance  Parameter           ...                1.5461369937869296
    

    So the GP explains everything away as noise (in this case, the actual value of the lengthscales is pretty arbitrary, and it's the tiny kernel variance that's important). (If you use predict_y instead of predict_f you should get a confidence interval that covers most of the observations.) The "RBF" (I prefer squared exponential - every stationary kernel describes radial basis functions...) kernel makes very strong smoothness assumptions on the functions in your prior (and also this only uses maximum likelihood point estimates for the hyperparameters), and so in that sense there's not much flexibility - and once you've explained all the data away, the GP is in some sense "saying" that there is no signal, therefore you get the prior back - which has zero mean. Does this help a bit ?