Calculating the minimum distance between two DataFrames

I like to find the item of DF2 that is cloest to the item in DF1.

The distance is euclidean distance.

For example, for A in DF1, F in DF2 is the cloeset one.

>>> DF1
   X  Y name
0  1  2    A
1  3  4    B
2  5  6    C
3  7  8    D
>>> DF2
   X  Y name
0  3  8    E
1  2  4    F
2  1  9    G
3  6  4    H

My code is

DF1 = pd.DataFrame({'name' : ['A', 'B', 'C', 'D'],'X' : [1,3,5,7],'Y' : [2,4,6,8]})
DF2 = pd.DataFrame({'name' : ['E', 'F', 'G', 'H'],'X' : [3,2,1,6],'Y' : [8,4,9,4]})


def ndis(row):
    try:
        X,Y=row['X'],row['Y']
        DF2['DIS']=(DF2.X-X)*(DF2.X-X)+(DF2.Y-Y)*(DF2.Y-Y)
        temp=DF2.ix[DF2.DIS.idxmin()]
        return temp[2]  #       print temp[2]
    except:
        pass        


DF1['Z']=DF1.apply(ndis, axis=1)

This works fine, and it will take too long for large data set.

Another question is to how to find the 2nd and 3d cloeset ones.

Solution

There is more than one approach, for example one can use numpy:

>>> xy = ['X', 'Y']
>>> distance_array = numpy.sum((df1[xy].values - df2[xy].values)**2, axis=1)
>>> distance_array.argmin()
1

Top 3 closest (not the fastest approach, I suppose, but simplest)

>>> distance_array.argsort()[:3]
array([1, 3, 2])

If speed is a concern, run performance tests.