python tensorflow tensorflow2.0 tensorflow-probability gpflow

Custom Haversine Matern52 kernel in GPflow 2.0

Using GPflow 2.0, I want to implement a custom Matern 5/2 kernel with the Haversine instead of Euclidean distance. I created a custom class on top of the gpflow.kernels.Matern52 class that includes a scaled_squared_dist function to override the scaled_squared_euclid_dist inherited from the Stationary class.

The class as currently written does not alter the Matern52 class; GP regression with this HaversineKernel_Matern52 kernel behaves exactly like GP regression with a Matern52 kernel.

import gpflow
from gpflow.utilities.ops import square_distance

class HaversineKernel_Matern52(gpflow.kernels.Matern52):
    """
    Isotropic Matern52 Kernel with Haversine distance instead of euclidean distance.
    Assumes 2-dimensional inputs, with columns [latitude, longitude] in degrees.
    """

    def __init__(self, lengthscale=1.0, variance=1.0, active_dims=None, ard=None):
        super().__init__(active_dims=active_dims, variance=variance, 
                         lengthscale=lengthscale, ard=ard)

    def haversine_dist(self, X, X2):
        pi = np.pi / 180
        f = tf.expand_dims(X * pi, -2)  # ... x N x 1 x D
        f2 = tf.expand_dims(X2 * pi, -3)  # ... x 1 x M x D
        d = tf.sin((f - f2) / 2) ** 2
        lat1, lat2 = tf.expand_dims(X[:, 0] * pi, -1), \
                    tf.expand_dims(X2[:, 0] * pi, -2)
        cos_prod = tf.cos(lat2) * tf.cos(lat1)
        a = d[:,:,0] + cos_prod * d[:,:,1]
        c = tf.asin(tf.sqrt(a)) * 6371 * 2
        return c

    def scaled_squared_dist(self, X, X2=None):
        """
        Returns ||(X - X2ᵀ) / ℓ||² i.e. squared L2-norm.
        """

        X_scaled = X / self.lengthscale
        X2_scaled = X2 / self.lengthscale if X2 is not None else X2
        return square_distance(X_scaled, X2_scaled)

What do I need to change to make this kernel recalculate Haversine distance properly?

This question builds on GPflow issue #969.

Thank you!

Solution

The GP code makes use of a kernel's K (and K_diag) methods. In GPflow 2.0.0rc1 and the develop branch, for subclasses of Stationary, K calls self.scaled_squared_euclid_dist -- but the method you define in your Haversine version is called scaled_squared_dist, so this is a new method and you don't actually overwrite its base class method from the Matern52 kernel class! (Maybe the method in gpflow.kernels.stationaries.Stationary would be better called scaled_squared_dist instead.)

Moreover, your scaled_squared_dist only calls square_distance instead of self.haversine_dist; assuming the latter returns a distance, not its square, you also need to wrap it in tf.square(). The haversine_dist method also does not seem to take into account the lengthscale parameter.

If you want to try out several different kernels with a Haversine distance, a more robust/reusable way of coding this up might be to write a wrapper class that takes any stationary kernel as an argument, and redefining the kernel matrix methods:

def haversine_dist(X, X2):
    pi = np.pi / 180
    f = tf.expand_dims(X * pi, -2)  # ... x N x 1 x D
    f2 = tf.expand_dims(X2 * pi, -3)  # ... x 1 x M x D
    d = tf.sin((f - f2) / 2) ** 2
    lat1, lat2 = tf.expand_dims(X[:, 0] * pi, -1), \
                tf.expand_dims(X2[:, 0] * pi, -2)
    cos_prod = tf.cos(lat2) * tf.cos(lat1)
    a = d[:,:,0] + cos_prod * d[:,:,1]
    c = tf.asin(tf.sqrt(a)) * 6371 * 2
    return c


class HaversineDistance(gpflow.kernels.Stationary):
    def __init__(self, base: gpflow.kernels.Stationary):
        self.base = base

    @property
    def active_dims(self):
        return self.base.active_dims

    @property
    def variance(self):
        return self.base.variance  # for K_diag to work

    def K(self, X, X2=None, presliced=False):
        if not presliced:
            X, X2 = self.slice(X, X2)
        if X2 is None:
            X2 = X

        # Note that lengthscale is in Haversine distance space:
        r = haversine_dist(X, X2) / self.base.lengthscale

        if hasattr(self.base, "K_r"):
            return self.base.K_r(r)
        else:
            return self.base.K_r2(tf.square(r))

where I factored out your haversine_dist into its own function (this would make it easier to write tests for!). This works with any stationary kernel, whether SquaredExponential (which defines K_r2) or Matern52 (which defines K_r). You can use it simply as follows:

k = HaversineDistance(gpflow.kernels.SquaredExponential())

or with a Matern 3/2 as the base kernel:

k = HaversineDistance(gpflow.kernels.Matern32())

Note that, as always, hyperparameter initialisation is important, and you need to remember that the lengthscale of the base kernel acts within the Haversine distance space and so needs to be appropriate for that - depending on your dataset. Either pass it in as the lengthscale and variance parameters when constructing the base kernel:

k = HaversineDistance(gpflow.kernels.Matern32(variance=5.0, lengthscale=0.1))

or by using assign() on the gpflow.Parameter attributes of the base kernel:

k.base.variance.assign(5.0)
k.base.lengthscale.assign(0.1)