Search code examples
pythonscikit-learndistancenearest-neighborweighted

Scikit-learn Nearest Neighbor search with weighted distance metric


Trying to use minkowski distance and pass weights but the sklearn metrics do not allow this. Tried pdist and cdist from scipy but these calculate the distances before hand!

    import pandas as pd
    from sklearn.neighbors import NearestNeighbors

        X = pd.read_csv('.file.csv')

        weights = [1] * X.shape[1] # filled with 1's for now

        nbrs = NearestNeighbors(
                                algorithm = 'brute',
                                metric = minkowski(u, v, p=1, w=weights), n_jobs = -1)
                               .fit(X)

    distances, indices = nbrs.kneighbors(X=X, n_neighbors=50, return_distance=True)

This returns:

"NameError: name 'u' is not defined"

callable(minkowski) returns True!

I know I'm not passing u and v so unsurprisingly the error shows up. The documentation for this is a bit poor for using other metrics outside from those supported in sklearn. How can I use a weighted metric from scipy for example?


Solution

  • The way you are trying to include the weights is your problem. As u and v are not defined and are internally passed to the metric callable you shouldn't actually include them in your code. You should create a partial function with functools.partialfrom minkowski with the values of p and w predefined.

    from functools import partial
    
    w_minkowski = partial(minkowski, p=1, w=weights)
    nbrs = NearestNeighbors(algorithm='brute', metric=w_minkowski, n_jobs=-1)
    nbrs.fit(X)
    ...