Search code examples
tensorflowtensorflow-probabilitygpflow

Numerical instabilities when bounding GPFlow hyperparameters


in Bounding hyperparameter optimization with Tensorflow bijector chain in GPflow 2.0, I found an excellent explanation of how to set boundaries to my hyperparameters.

Unfortunately, I noticed that using the tensorflow_probability.bijectors.Sigmoid transform causes numerical instabilities which lead to parameter values outside [low, high] for me.

My current workaround is to define my own sigmoid transform that uses the alternative implementation in the comments of the tensorflow_probability source code:

import tensorflow as tf
from tensorflow_probability import bijectors as tfb
from tensorflow_probability import math as tfm

class mySigmoid(tfb.Sigmoid):
    def _stable_sigmoid(x):
        """A (more) numerically stable sigmoid than `tf.math.sigmoid`."""
        x = tf.convert_to_tensor(x)
        if x.dtype == tf.float64:
            cutoff = -20
        else:
            cutoff = -9
        return tf.where(x < cutoff, tf.exp(x), tf.math.sigmoid(x))

    def _forward(self, x):
        if self._is_standard_sigmoid:
            return self._stable_sigmoid(x)
        lo = tf.convert_to_tensor(self.low)  # Concretize only once
        hi = tf.convert_to_tensor(self.high)
        ans = hi * tf.sigmoid(x) + lo * tf.sigmoid(-x)
        return tfb.math.clip_by_value_preserve_gradient(ans, lo, hi)

It is noted in the Tensorflow source that this approach has some drawbacks, however, so I wanted to ask if there are alternative ways to bound hyperparameters in GPFlow?


Solution

  • Another way to constrain a parameter in GPflow is to place a prior on it. For example:

    k = gpflow.kernels.Matern32()
    k.variance.prior = tfp.distributions.Gamma(to_default_float(2), to_default_float(3))
    

    See more here.

    Whether a prior is an appropriote solution depends on the details of what you're trying to accomplish.