Search code examples
pythonscikit-learnnearest-neighbormahalanobis

Nearest neighbors with uncertain points


I have two 2D points sets A and B. I want to find the first nearest neighbor in A for each point in B. However, I am dealing with uncertain points (i.e. a point has a mean (2D vector) and a 2*2 covariance matrix).

I thus would like to use the Mahalanobis distance, but in scikit-learn (for example), I cannot pass a covariance matrix for each point, as it expects a single covariance matrix.

Currently, considering only the average locations (i.e. mean of my 2D normal distribution), I have:

nearest_neighbors = NearestNeighbors(n_neighbors=1, metric='l2').fit(A)
distance, indices = nearest_neighbors.kneighbors(B)

With my uncertain points, instead of using the L2 norm as a distance, I would rather compute (between a point a in A and a point b in B, their Mahalanobis distance:

d(a, b) = sqrt( transpose(mu_a-mu_b) * C * (mu_a-mu_b))

where C = inv(cov_a + cov_b)

where mu_a (resp mu_b) and cov_a (resp. cov_b) are the 2D mean and 2*2 covariance matrix of uncertain point a (resp. b).


Solution

  • I ended up using a custom distance:

    def my_mahalanobis_distance(x, y):
        '''
        x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, 
                                x[2]: cov_x_11, x[3]: cov_x_22
        y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
                                y[2]: cov_y_11, y[3]: cov_y_22 
        '''     
    
    
    
        return sp.spatial.distance.mahalanobis(x[:2], y[:2], 
                                               np.linalg.inv(np.diag(x[2:]) 
                                               + np.diag(y[2:])))
    

    Thus a point has 4 features:

    • x and y coordinates
    • x and y variances (covariance matrix is diagonal in my case)