Search code examples
pythonsavemeanrestoregpflow

Restoring GPflow Model with Mean Function doesn't work


I have been following the methodology of saving/restoring GPflow models with success. But now I've run into a snag.

When I try to restore a model with a Linear mean function, the restore crashes with an error.

I think that the issue comes in the naming convention of the tensorflow Linear mean function object. The above "-44dbadbb-0" is random and changes every time the model is rebuilt, so if I check the tensor names when I saved the model with

from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
print_tensors_in_checkpoint_file(file_name='./model.ckpt', tensor_name='', all_tensors=False)

I get the return:

Linear-eeb5f9f3-0/A/unconstrained (DT_DOUBLE) [1,1] Linear-eeb5f9f3-0/b/unconstrained (DT_DOUBLE) [1] model/X/dataholder (DT_DOUBLE) [15,1] model/Y/dataholder (DT_DOUBLE) [15,1] model/kern/kernels/0/lengthscales/unconstrained (DT_DOUBLE) [] model/kern/kernels/0/variance/unconstrained (DT_DOUBLE) [] model/kern/kernels/1/lengthscales/unconstrained (DT_DOUBLE) [] model/kern/kernels/1/variance/unconstrained (DT_DOUBLE) [] model/likelihood/variance/unconstrained (DT_DOUBLE) []

Where the Linear function clearly has a different name from the model which is trying to be restored.

I have tried to fix this by renaming the variables before the restore, but this doesn't work with tensorflow. I also tried different saving/restoring methods, but then I have problems with being able to sample from the model.

Saving the Model


    import gpflow
    import numpy as np
    import random
    import tensorflow as tf

    # define data
    rng = np.random.RandomState(4)
    X = rng.uniform(0, 5.0, 15)[:, np.newaxis]
    Y = np.sin((X[:, 0] - 2.5) ** 2).reshape(len(X),1)

    # define the mean function
    mf = gpflow.mean_functions.Linear(np.ones((1,1)),np.zeros((1,)))

    # create the GP model
    with gpflow.defer_build():
        k = gpflow.kernels.Matern32(1)+gpflow.kernels.RBF(1)
        m = gpflow.models.GPR(X, Y, kern=k,name='model',mean_function=mf)
        m.likelihood.variance = 1e-03
        m.likelihood.trainable = False

    tf.global_variables_initializer()

    tf_session = m.enquire_session()
    m.compile( tf_session )

    gpflow.train.ScipyOptimizer().minimize(m)

    saver = tf.train.Saver()
    save_path = saver.save(tf_session, "./model.ckpt")
    print("Model saved in path: %s" % save_path)

Restoring the Model


    import gpflow
    import numpy as np
    import random
    import tensorflow as tf

    # define data
    rng = np.random.RandomState(4)
    X = rng.uniform(0, 5.0, 15)[:, np.newaxis]
    Y = np.sin((X[:, 0] - 2.5) ** 2).reshape(len(X),1)

    # define the mean function
    mf = gpflow.mean_functions.Linear(np.ones((1,1)),np.zeros((1,)))

    with gpflow.defer_build():
        k = gpflow.kernels.Matern32(1)+gpflow.kernels.RBF(1)
        m = gpflow.models.GPR(X, Y, kern=k,name='model',mean_function=mf)
        m.likelihood.variance = 1e-03
        m.likelihood.trainable = False

    # construct and compile the tensorflow session
    tf.global_variables_initializer()
    tf_session = m.enquire_session()
    m.compile( tf_session )

    saver = tf.train.Saver()

    save_path = saver.restore(tf_session, "./model.ckpt")
    print("Model loaded from path: %s" % save_path)

    m.anchor(tf_session)

The code crashes at save_path = saver.restore(tf_session, "./model.ckpt") with the error:

NotFoundError (see above for traceback): Key Linear-44dbadbb-0/A/unconstrained not found in checkpoint...


Solution

  • The defer_build() does a bunch of things - but one part of constructing the entire model (i.e. tensorflow graph) in one go is that all the tensorflow variables & placeholders get consistent names, with all their names relating to the name of the model itself (which you set by passing the name='model' keyword argument to the model constructor).

    In your code, however, the Linear mean function is constructed outside of the defer_build() scope. This means gpflow has to construct a graph for it right away - including setting up variables for the parameters (slope & offset in this case). All tensorflow variables live in a global name space, so the only way of allowing more than a single object to be created is to assign them randomized names. (E.g., imagine wanting to construct a sum of two kernels of the same type!)

    Fortunately, the fix is easy: simply move the construction of the mean function into the defer_build block:

    with gpflow.defer_build():
        # define the mean function
        mf = gpflow.mean_functions.Linear(np.ones((1,1)), np.zeros((1,)))
    
        k = gpflow.kernels.Matern32(1) + gpflow.kernels.RBF(1)
        m = gpflow.models.GPR(X, Y, kern=k, mean_function=mf, name='model')
        m.likelihood.variance = 1e-03
        m.likelihood.trainable = False
    
    # construct and compile the tensorflow session
    tf.global_variables_initializer()
    tf_session = m.enquire_session()
    m.compile(tf_session)
    

    If you do this in both the "save" and "load" scripts, everything runs and hopefully as you expect it. Hope this helps!