I have the following dataframe
A_key Date
A1 2016-05-03
A1 2016-09-25
A2 2015-02-25
A2 2015-02-25
A3 2015-10-04
A3 2016-03-15
A3 2016-04-10
A4 2015-09-26
A4 2015-09-26
I want to obtain the nearest neighbor distance for each distinct A_key in units of days for n_neighbor(k) = 1 such that the output looks like the following
A_key Date Distance
A1 2016-05-03 145
A1 2016-09-25 145
A2 2015-02-25 0
A2 2015-02-25 0
A3 2015-10-04 163
A3 2016-03-15 26
A3 2016-04-10 26
A4 2015-09-26 0
A4 2015-09-26 0
This base up on the groupby
to split your original df into small unique key dataframe , then we using numpy
broadcast to speed up the whole calculation
df.Date=pd.to_datetime(df.Date)
l=[]
for _, x in df.groupby('A_key'):
s=np.abs((x['Date'].values - x['Date'].values[:,None])).astype('timedelta64[D]').astype(int)
s[[np.arange(len(s))] * 2]=9999
l.append(np.min(s,1))
df['New']=np.concatenate(l)
df
Out[501]:
A_key Date New
0 A1 2016-05-03 145
1 A1 2016-09-25 145
2 A2 2015-02-25 0
3 A2 2015-02-25 0
4 A3 2015-10-04 163
5 A3 2016-03-15 26
6 A3 2016-04-10 26
7 A4 2015-09-26 0
8 A4 2015-09-26 0