I have a base vector (consisting of 1's and 0's) and I want to find the cosine distance to 50,000 other vectors (also consisting of 1's and 0's). I found many ways to calculate an entire matrix of pairwise distance, but I'm not interested in that. Rather, I'm just interested in getting the 50,000 distances of my base vector against each other vector (and then sorting to find the top 5). What's the fastest way I could achieve this?
The vectorized operation is exactly the same as doing them individually, as long as you are careful with the axes. Here I have individual "other" vectors in each row:
others = numpy.random.randint(0,2,(10,10))
base = numpy.random.randint(0,2,(10,1))
d = numpy.inner(base.T, others) / (numpy.linalg.norm(others, axis=0) * numpy.linalg.norm(base))