I have two sets of vectors A (red) and B (blue). I'm looking to find the nearest vector in A to each point in B.
I got this working using:
import numpy as np
import random
def get_assignment(A, B):
"""
returns `assignment`, where `assignment[i]` means A[assignment[i]] is the nearest point to B[i]
"""
cost = np.linalg.norm(A[:, np.newaxis, :] - B, axis=2)
assignment = np.argmin(cost, axis=0)
return assignment
if __name__ == '__main__':
An = 3
Bn = 10
A = np.array([ [ random.random(), random.random() ] for i in range(An) ])
B = np.array([ [ random.random(), random.random() ] for i in range(Bn) ])
assignment = get_assignment(A, B)
import pylab
for i in range(len(A)):
Bi = np.where(assignment == i)[0]
for j in Bi:
x1, y1 = A[i]
x2, y2 = B[j]
pylab.plot(x1, y1, 'ro')
pylab.plot(x2, y2, 'bo')
pylab.plot([x1, x2], [y1, y2], 'k-', lw=0.5)
pylab.axis('equal')
pylab.show()
But I'm looking to do this where A has around 1,000 elements and B 1,000,000. Is there a way to do this using less memory?
I would suggest applying a K-D tree.
Example:
import numpy as np
import random
import sklearn.neighbors
def get_assignment(A, B):
"""
returns `assignment`, where `assignment[i]` means A[assignment[i]] is the nearest point to B[i]
"""
tree = sklearn.neighbors.KDTree(A)
assignment = tree.query(B, return_distance=False).reshape(-1)
return assignment
if __name__ == '__main__':
An = 1000
Bn = 1000000
A = np.random.rand(An, 2)
B = np.random.rand(Bn, 2)
assignment = get_assignment(A, B)
Measuring this using /usr/bin/time -v
, it takes roughly 36 MB to do this query. It's also much faster.