In the document of GPflow like SVGP and natural gradient, the Adam optimizer in TensorFlow is used when it comes to training model parameters (lengthscale, variance, inducing inputs, etc) of the GP model using stochastic variational inference technique, while the natural gradient optimizer for variational parameters. A snippet looks as follows
def run_adam(model, iterations):
"""
Utility function running the Adam optimizer
:param model: GPflow model
:param interations: number of iterations
"""
# Create an Adam Optimizer action
logf = []
train_iter = iter(train_dataset.batch(minibatch_size))
training_loss = model.training_loss_closure(train_iter, compile=True)
optimizer = tf.optimizers.Adam()
@tf.function
def optimization_step():
optimizer.minimize(training_loss, model.trainable_variables)
for step in range(iterations):
optimization_step()
if step % 10 == 0:
elbo = -training_loss().numpy()
logf.append(elbo)
return logf
As demonstrated, model.trainable_variables is passed to the Adam optimizer, which is inherited from tf.Module, and is composed of several parameters including lengthscale and variance.
What I am concerning is whether the Adam optimizer is working on unconstrained or constrained version of the parameters of the model. A snippet of test code runs as follows
import gpflow as gpf
import numpy as np
x = np.arange(10)[:, np.newaxis]
y = np.arange(10)[:, np.newaxis]
model = gpf.models.GPR((x, y),
kernel = gpf.kernels.SquaredExponential(variance = 2, lengthscales = 3),
noise_variance = 4)
model.kernel.parameters[0].unconstrained_variable is model.trainable_variables[0]
and returns
True
As far as I know, parameters of the gaussian process like lenghtscales and the variances of a kernel are nonegative, and they should be constrained when training. I am not an expert of the source code of GPflow or TensorFlow, but it seems that Adam is working on unconstrained parameters. Is this simply a misunderstanding of me, or anything else?
Thanks in advance for any help!
You're right, and that's by design. A constrained variable in GPflow is represented by a Parameter
. The Parameter
wraps the unconstrained_variable
. When you access .trainable_variables
on your model, this will include the unconstrained_variable
of the Parameter
, and so when you pass these variables to the optimizer, the optimizer will train those rather than the Parameter
itself.
But your model doesn't see the unconstrained_value
, it sees the Parameter
interface which is a tf.Tensor
-like interface related to the wrapped unconstrained_variable
via an optional transformation. This transformation maps the unconstrained value to a constrained value. As such, your model will only see the constrained value. It's not a problem that your constrained value must be positive, the transform will map negative values of the unconstrained values to positive values for the constrained value.
You can see the unconstrained and constrained values of the first Parameter
for your kernel, as well as the transform that relates them, with
param = model.kernel.parameters[0]
param.value() # this is what your model will see
param.unconstrained_variable # this is what the optimizer will see
param.transform # the above two are related via this