Search code examples
pythonpandasdatetimetime-seriesfinance

Merging time series dataframes in python


I am working with some financial tick data. Given two example dataframes like this:

left_df =

Time                          Bid Price    Ask Price
2022-01-02 00:00:01.323597    100          101
2022-01-02 00:00:01.828502    100          101
2022-01-02 00:00:01.845020    100          101
2022-01-02 00:00:03.123567    100          101

right_df =

Time                          Bid Price    Ask Price
2022-01-02 00:00:01.110223    500          501
2022-01-02 00:00:01.800000    500          501
2022-01-02 00:00:03.100000    500          501

If I 'merge' left to right, I want the merged dataframe to look like this:

Time_left                     Time_right                    Bid Price_left    Ask Price_left    Bid Price_right    Ask Price_right
2022-01-02 00:00:01.323597    2022-01-02 00:00:01.110223    100               101               500                501
2022-01-02 00:00:01 828502    2022-01-02 00:00:01.800000    100               101               500                501                               
2022-01-02 00:00:01.845020    2022-01-02 00:00:01.800000    100               101               500                501
2022-01-02 00:00:03.123567    2022-01-02 00:00:03.100000    100               101               500                501

i.e. for each time_left x, get the most recent time_right y up to x, and y can be equal to x.

Whereas if I wanted to 'merge' right to left, the resulting dataframe should look like this:

Time_right                     Time_left                    Bid Price_right    Ask Price_right    Bid Price_left    Ask Price_left
2022-01-02 00:00:01.800000    2022-01-02 00:00:01.323597    500               501               100                101
2022-01-02 00:00:03.100000    2022-01-02 00:00:01.845020    500               501               100                101

What would be the most efficient way of doing this on a dataset that could potentially have tens of millions of rows?


Solution

  • Try this

    # convert to datetime
    left_df['Time'] = pd.to_datetime(left_df['Time'])
    right_df['Time'] = pd.to_datetime(right_df['Time'])
    # insert time_right column
    right_df.insert(1, 'Time_right', right_df['Time'])
    # merge_asof
    df = pd.merge_asof(left_df, right_df, on='Time', suffixes=('_left','_right'))
    print(df)
                            Time  Bid_Price_left  Ask_Price_left                   Time_right  Bid_Price_right  Ask_Price_right  
    0 2022-01-02 00:00:01.323597             100             101   2022-01-02 00:00:01.110223              500              501  
    1 2022-01-02 00:00:01.828502             100             101   2022-01-02 00:00:01.800000              500              501  
    2 2022-01-02 00:00:01.845020             100             101   2022-01-02 00:00:01.800000              500              501  
    3 2022-01-02 00:00:03.123567             100             101   2022-01-02 00:00:03.100000              500              501  
    

    Source