I like to find the item of DF2 that is cloest to the item in DF1.
The distance is euclidean distance.
For example, for A in DF1, F in DF2 is the cloeset one.
>>> DF1
X Y name
0 1 2 A
1 3 4 B
2 5 6 C
3 7 8 D
>>> DF2
X Y name
0 3 8 E
1 2 4 F
2 1 9 G
3 6 4 H
My code is
DF1 = pd.DataFrame({'name' : ['A', 'B', 'C', 'D'],'X' : [1,3,5,7],'Y' : [2,4,6,8]})
DF2 = pd.DataFrame({'name' : ['E', 'F', 'G', 'H'],'X' : [3,2,1,6],'Y' : [8,4,9,4]})
def ndis(row):
try:
X,Y=row['X'],row['Y']
DF2['DIS']=(DF2.X-X)*(DF2.X-X)+(DF2.Y-Y)*(DF2.Y-Y)
temp=DF2.ix[DF2.DIS.idxmin()]
return temp[2] # print temp[2]
except:
pass
DF1['Z']=DF1.apply(ndis, axis=1)
This works fine, and it will take too long for large data set.
Another question is to how to find the 2nd and 3d cloeset ones.
There is more than one approach, for example one can use numpy:
>>> xy = ['X', 'Y']
>>> distance_array = numpy.sum((df1[xy].values - df2[xy].values)**2, axis=1)
>>> distance_array.argmin()
1
Top 3 closest (not the fastest approach, I suppose, but simplest)
>>> distance_array.argsort()[:3]
array([1, 3, 2])
If speed is a concern, run performance tests.