Search code examples
pythonpandasvectorization

Pandas vectorization - finding nearest future time in another DataFrame


Say I have two pandas time-series dataframes:

a = pd.DataFrame([1,2,3,4,5,6,7,8,9],index=pd.date_range('2020-01-01', periods=9, freq='10min'))
b = pd.DataFrame(['a','b','c'],index=pd.date_range('2020-01-01', periods=3, freq='30min'))

Is there a way to vectorize the following algorithm, using pandas?

next_b = []
for row in a.itertuples():
    df = b[b.index >= row.Index]
    if len(df.index):
        next_b.append(df.index[0])
    else:
        next_b.append(None)    
a['next_b'] = next_b

output:

                    0   next_b
2020-01-01 00:00:00 1   2020-01-01 00:00:00
2020-01-01 00:10:00 2   2020-01-01 00:30:00
2020-01-01 00:20:00 3   2020-01-01 00:30:00
2020-01-01 00:30:00 4   2020-01-01 00:30:00
2020-01-01 00:40:00 5   2020-01-01 01:00:00
2020-01-01 00:50:00 6   2020-01-01 01:00:00
2020-01-01 01:00:00 7   2020-01-01 01:00:00
2020-01-01 01:10:00 8   NaT
2020-01-01 01:20:00 9   NaT

Solution

  • with merge_asof, you can do

    res = pd.merge_asof(a, b.assign(next_b=b.index), 
                        left_index=True, right_index=True, 
                        direction='forward', suffixes=('','_b'))
    print(res)
    #                      0  0_b              next_b
    # 2020-01-01 00:00:00  1    a 2020-01-01 00:00:00
    # 2020-01-01 00:10:00  2    b 2020-01-01 00:30:00
    # 2020-01-01 00:20:00  3    b 2020-01-01 00:30:00
    # 2020-01-01 00:30:00  4    b 2020-01-01 00:30:00
    # 2020-01-01 00:40:00  5    c 2020-01-01 01:00:00
    # 2020-01-01 00:50:00  6    c 2020-01-01 01:00:00
    # 2020-01-01 01:00:00  7    c 2020-01-01 01:00:00
    # 2020-01-01 01:10:00  8  NaN                 NaT
    # 2020-01-01 01:20:00  9  NaN                 NaT