Incorporate known data moments into GPFlow fitting

I have recently been working with gpflow, in-particular Gaussian process regression, to model a process for which I have access to approximated moments for each input. I have a vector of input values X of size (N,1) and a vector of responses Y of size (N,1). However, I also know, for each (x,y) pair, an approximation of the associated variance, skewness, kurtosis and so on for the particular y value.

From this, I know properties that inform me of appropriate likelihoods to use for each data point. In the simplest case, I just assume all likelihoods are Gaussian, and specify the variance at each point. I've created a minimal example of my code by adapting the tutorial on: https://nbviewer.jupyter.org/github/GPflow/GPflow/blob/develop/doc/source/notebooks/advanced/varying_noise.ipynb#Demo-2:-grouped-noise-variances.

import numpy as np
import gpflow 

def generate_data(N=100):
    X = np.random.rand(N)[:, None] * 10 - 5  # Inputs, shape N x 1
    F = 2.5 * np.sin(6 * X) + np.cos(3 * X)  # Mean function values
    groups = np.arange( 0, N, 1 ).reshape(-1,1)  
    NoiseVar = np.array([i/100.0 for i in range(N)])[groups]  
    Y = F + np.random.randn(N, 1) * np.sqrt(NoiseVar)  # Noisy data
    return X, Y, groups, NoiseVar

# Get data
X, Y, groups, NoiseVar = generate_data()
Y_data = np.hstack([Y, groups])
# Generate one likelihood per data-point
likelihood = gpflow.likelihoods.SwitchedLikelihood( [gpflow.likelihoods.Gaussian(variance=NoiseVar[i]) for i in range(Y.shape[0])])

# model construction (notice that num_latent is 1)
kern  = gpflow.kernels.Matern52(input_dim=1, lengthscales=0.5)
model = gpflow.models.VGP(X, Y_data, kern=kern, likelihood=likelihood, num_latent=1)
# Specify the likelihood as non-trainable
model.likelihood.set_trainable(False)

# build the natural gradients optimiser
natgrad_optimizer = gpflow.training.NatGradOptimizer(gamma=1.)
natgrad_tensor    = natgrad_optimizer.make_optimize_tensor(model, var_list=[(model.q_mu, model.q_sqrt)])
session = model.enquire_session()
session.run(natgrad_tensor)
# update the cache of the variational parameters in the current session
model.anchor(session)

# Stop Adam from optimising the variational parameters
model.q_mu.trainable   = False
model.q_sqrt.trainable = False
# Create Adam tensor
adam_tensor = gpflow.train.AdamOptimizer(learning_rate=0.1).make_optimize_tensor(model)
for i in range(200): 
    session.run(natgrad_tensor)
    session.run(adam_tensor)
# update the cache of the parameters in the current session
model.anchor(session)
print(model)

The above code works for a gaussian likelihood, and known variances. Inspecting my real data, I see that it is skewed very often and as a result, I want to use non-gaussian likelihoods to model it, but am unsure how to specify these other likelihood parameters given what I know.

So my question is: Given this setup, how can I adapt my code so far to include non-Gaussian likelihoods at each step, in-particular specifying and fixing their parameters based on my known variances, skewness, kurtosis and so on associated with each individual y value?

Solution

Firstly, you will need to choose which non-Gaussian likelihood you use. GPflow includes various ones in likelihoods.py. You then need to adapt the line

likelihood = gpflow.likelihoods.SwitchedLikelihood(
    [gpflow.likelihoods.Gaussian(variance=NoiseVar[i]) for i in range(Y.shape[0])]
)

to give a list of your non-Gaussian likelihoods.

Which likelihood can take advantage of your skewness and kurtosis information is a statistical question. Depending on what you come up with, you may need to implement your own likelihood class, which can be done by inheriting from Likelihood. You should be able to follow some other examples from likelihoods.py.