Which type of parameter the Adam optimizer in GPflow is working on, constrained or unconstrained?

In the document of GPflow like SVGP and natural gradient, the Adam optimizer in TensorFlow is used when it comes to training model parameters (lengthscale, variance, inducing inputs, etc) of the GP model using stochastic variational inference technique, while the natural gradient optimizer for variational parameters. A snippet looks as follows

def run_adam(model, iterations):
    """
    Utility function running the Adam optimizer

    :param model: GPflow model
    :param interations: number of iterations
    """
    # Create an Adam Optimizer action
    logf = []
    train_iter = iter(train_dataset.batch(minibatch_size))
    training_loss = model.training_loss_closure(train_iter, compile=True)
    optimizer = tf.optimizers.Adam()

    @tf.function
    def optimization_step():
        optimizer.minimize(training_loss, model.trainable_variables)

    for step in range(iterations):
        optimization_step()
        if step % 10 == 0:
            elbo = -training_loss().numpy()
            logf.append(elbo)
    return logf

As demonstrated, model.trainable_variables is passed to the Adam optimizer, which is inherited from tf.Module, and is composed of several parameters including lengthscale and variance.

What I am concerning is whether the Adam optimizer is working on unconstrained or constrained version of the parameters of the model. A snippet of test code runs as follows

import gpflow as gpf
import numpy as np

x = np.arange(10)[:, np.newaxis]
y = np.arange(10)[:, np.newaxis]
model = gpf.models.GPR((x, y), 
                       kernel = gpf.kernels.SquaredExponential(variance = 2, lengthscales = 3), 
                       noise_variance = 4)

model.kernel.parameters[0].unconstrained_variable is model.trainable_variables[0]

and returns

True

As far as I know, parameters of the gaussian process like lenghtscales and the variances of a kernel are nonegative, and they should be constrained when training. I am not an expert of the source code of GPflow or TensorFlow, but it seems that Adam is working on unconstrained parameters. Is this simply a misunderstanding of me, or anything else?

Thanks in advance for any help!

Solution

You're right, and that's by design. A constrained variable in GPflow is represented by a Parameter. The Parameter wraps the unconstrained_variable. When you access .trainable_variables on your model, this will include the unconstrained_variable of the Parameter, and so when you pass these variables to the optimizer, the optimizer will train those rather than the Parameter itself.

But your model doesn't see the unconstrained_value, it sees the Parameter interface which is a tf.Tensor-like interface related to the wrapped unconstrained_variable via an optional transformation. This transformation maps the unconstrained value to a constrained value. As such, your model will only see the constrained value. It's not a problem that your constrained value must be positive, the transform will map negative values of the unconstrained values to positive values for the constrained value.

You can see the unconstrained and constrained values of the first Parameter for your kernel, as well as the transform that relates them, with

param = model.kernel.parameters[0]
param.value()  # this is what your model will see
param.unconstrained_variable  # this is what the optimizer will see
param.transform  # the above two are related via this