I have two 2D points sets A
and B
. I want to find the first nearest neighbor in A
for each point in B
.
However, I am dealing with uncertain points (i.e. a point has a mean (2D vector) and a 2*2 covariance matrix).
I thus would like to use the Mahalanobis distance, but in scikit-learn
(for example), I cannot pass a covariance matrix for each point, as it expects a single covariance matrix.
Currently, considering only the average locations (i.e. mean of my 2D normal distribution), I have:
nearest_neighbors = NearestNeighbors(n_neighbors=1, metric='l2').fit(A)
distance, indices = nearest_neighbors.kneighbors(B)
With my uncertain points, instead of using the L2 norm as a distance, I would rather compute (between a point a
in A
and a point b
in B, their Mahalanobis distance:
d(a, b) = sqrt( transpose(mu_a-mu_b) * C * (mu_a-mu_b))
where C = inv(cov_a + cov_b)
where mu_a
(resp mu_b
) and cov_a
(resp. cov_b
) are the 2D mean and 2*2 covariance matrix of uncertain point a
(resp. b
).
I ended up using a custom distance:
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
return sp.spatial.distance.mahalanobis(x[:2], y[:2],
np.linalg.inv(np.diag(x[2:])
+ np.diag(y[2:])))
Thus a point has 4 features:
x
and y
coordinatesx
and y
variances (covariance matrix is diagonal in my case)