Search code examples
pythonpandasdataframedatetimesubtraction

Subtract rows of two pandas dataframes with most similar dates


I have two pandas dataframes with economic data. Second dataframe contains data with higher granularity.

import datetime
import pandas as pd
df1 = pd.DataFrame(data=[2.88,
                         2.70],
                   index=[datetime.date(2020,11,1),
                          datetime.date(2020,12,1)])

df2 = pd.DataFrame(data=[0.88,
                         0.87,
                         0.90,
                         0.78,
                         0.79,
                         0.83,
                         0.96,
                         0.98,
                         0.89,
                         0.91,
                         0.87,
                         0.88,
                         0.86,
                         0.83,
                         0.86,
                         0.88,
                         0.88,
                         0.84,
                         0.84],
                   index=[datetime.date(2020,10,30),
                          datetime.date(2020,11,2),
                          datetime.date(2020,11,3),
                          datetime.date(2020,11,4),
                          datetime.date(2020,11,5),
                          datetime.date(2020,11,9),
                          datetime.date(2020,11,10),
                          datetime.date(2020,11,12),
                          datetime.date(2020,11,13),
                          datetime.date(2020,11,16),
                          datetime.date(2020,11,17),
                          datetime.date(2020,11,18),
                          datetime.date(2020,11,19),
                          datetime.date(2020,11,20),
                          datetime.date(2020,11,23),
                          datetime.date(2020,11,24),
                          datetime.date(2020,11,25),
                          datetime.date(2020,11,27),
                          datetime.date(2020,11,30)])

I want to get df1 - df2 where rows with most similar date indices are subtracted, e.g.:

df1.loc[datetime.date(2020,11,1)][0] - df2.loc[datetime.date(2020,11,2)][0]
df1.loc[datetime.date(2020,12,1)][0] - df2.loc[datetime.date(2020,11,30)][0]

Solution

  • When working with date/time it is recommended to use Pandas' internal datetime type:

    df1.index = pd.to_datetime(df1.index)
    df2.index = pd.to_datetime(df2.index)
    

    Then you can use merge_asof:

    tmp = pd.merge_asof(df1,df2,left_index=True, right_index=True, direction='nearest')
    
    out = tmp['0_x'] - tmp['0_y']
    

    Output:

    2020-11-01    2.01
    2020-12-01    1.86
    dtype: float64