Search code examples
pythonpandasvectorizationapply

Is there a way to vectorize the following python (pandas) code for faster speed?


I have 2 DataFrames

A_df = pd.DataFrame(data = np.arange(2, 103, 10) + np.random.randn(11), 
                    columns = ['Time(s)'])

B_df = pd.DataFrame(data = zip(range(1, 102), np.random.randn(101)), 
                    columns = ['Time(s)', 'Value'])

A_df.head()

    Time(s)
0   2.751352
1   12.028663
2   20.638388
3   29.821199
4   42.516302

B_df.head()

    Time(s) Value
0   1   1.075801
1   2   0.890754
2   3   -0.015543
3   4   0.085298
4   5   0.208645

I want to add a new column Value of B at T to A_df which is the latest value of B (from B_df) at time t (in A_df).

Currently I am doing this by using apply() method as follows:

A_df['Value of B at T'] = A_df.apply(lambda x: B_df.loc[B_df['Time(s)'] <= x['Time(s)'], 'Value'].values[-1], axis = 1)
A_df
    Time(s) Value of B at T
0   2.751352    -0.891782
1   12.028663   2.416335
2   20.638388   -0.186364
3   29.821199   -0.148716
4   42.516302   0.821272

I was wondering if there is a way to vectorize the functionality being carried out by apply() method to speed up the code?


Solution

  • You can try the following:

    A_df = A_df.assign(
        b_at_t=B_df
        .set_index('Time(s)')
        .reindex(A_df['Time(s)'], method='ffill')
        .values)