Search code examples
pythonscikit-learndistancehaversine

How to build a BallTree with haversine distance metric?


I have been studying how to implement a sklearn.neighbors.BallTree with sklearn.metrics.pairwise.haversine_distances metric.

Despite my efforts, I couldn't reach a working script.

Despite the standard example from the sklearn documentation here, when one attempts to use the haversine distance metric within the BallTree, the whole initialization of the class breaks into a ValueError. See below a simple script that results in this problem:

from sklearn.neighbors import BallTree
import numpy as np
from sklearn import metrics
X = rng.random_sample((10, 2))  # 10 points in 2 dimensions
tree = BallTree(X, metric=metrics.pairwise.haversine_distances)

Returned error:

ValueError: Buffer has wrong number of dimensions (expected 2, got 1)

How to resolve this?


Solution

  • Use metric="haversine". From the docs (emphasis in the original):

    Note: Callable functions in the metric parameter are NOT supported for KDTree and Ball Tree. Function call overhead will result in very poor performance.

    See also the documentation on distance_metrics.