Search code examples
pythonpandasdatediffnearest-neighbordate-difference

Nearest neighbor distance for k=1 in units of time


I have the following dataframe

A_key      Date
      A1      2016-05-03
      A1      2016-09-25
      A2      2015-02-25
      A2      2015-02-25
      A3      2015-10-04
      A3      2016-03-15
      A3      2016-04-10
      A4      2015-09-26
      A4      2015-09-26

I want to obtain the nearest neighbor distance for each distinct A_key in units of days for n_neighbor(k) = 1 such that the output looks like the following

      A_key      Date       Distance
      A1      2016-05-03     145
      A1      2016-09-25     145
      A2      2015-02-25     0
      A2      2015-02-25     0
      A3      2015-10-04     163
      A3      2016-03-15     26
      A3      2016-04-10     26
      A4      2015-09-26     0
      A4      2015-09-26     0

Solution

  • This base up on the groupby to split your original df into small unique key dataframe , then we using numpy broadcast to speed up the whole calculation

    df.Date=pd.to_datetime(df.Date)
    l=[]
    for _, x in df.groupby('A_key'):
        s=np.abs((x['Date'].values - x['Date'].values[:,None])).astype('timedelta64[D]').astype(int)
        s[[np.arange(len(s))] * 2]=9999
        l.append(np.min(s,1))
    
    df['New']=np.concatenate(l)
    df
    Out[501]: 
      A_key       Date  New
    0    A1 2016-05-03  145
    1    A1 2016-09-25  145
    2    A2 2015-02-25    0
    3    A2 2015-02-25    0
    4    A3 2015-10-04  163
    5    A3 2016-03-15   26
    6    A3 2016-04-10   26
    7    A4 2015-09-26    0
    8    A4 2015-09-26    0