Search code examples
pythonpandasmergeconcatenationlag

Merging dataframes (with different date time --monthly vs daily) at the same time applying lagged values to one of the dataframes


I have 2 dataframes I wish to merge:

df1 looks like this:

Date          Col1  Col 2  Col 3    Col 4   
    
2016-03     27.57   0.93    28.7    1.57    
2016-04     25.83   0.23    28.34   0.84    
2016-05     24.55   0.27    27.11   0.03    

df2 looks like this:

Date          ColA            

2016-03-21  7.640769230769231   
2016-03-22  7.739720279720279   
2016-03-23  7.577311827956988   
2016-03-24  7.745416666666666   

As you can see, df1 is a monthly data and df2 is a daily data. However, I want to merge them in a daily format (following df2) but I also want df1 to be lagged (lag = -30)

This is my desired output:

Output:

   Date          ColA              Col1  Col 2  Col 3    Col 4     
      
    2016-03-21  7.640769230769231   25.83   0.23    28.34   0.84
    2016-03-22  7.739720279720279   25.83   0.23    28.34   0.84
    2016-03-23  7.577311827956988   25.83   0.23    28.34   0.84
    2016-03-24  7.745416666666666   25.83   0.23    28.34   0.84


....2016-04-01  xxxxxxxx            24.55   0.27    27.11   0.03

I tried this but, they just merge and the lags were not applied.

out = (df2.merge(df1.shift(-30), on='Date').axis=1)

EDIT: Since I can't make the below suggestions work on my specific problem, what I did is this:

   out['Col1']= out['Col1'].shift(7).dropna()
    out = pd.merge_asof(df1, df2,  on = 'Date')

This is to lag only 1 column (which I decided to do since most columns have possibility to have different lag times.


Solution

  • For merge use merge_asof:

    print (df1)
          Date   Col1  Col 2  Col 3  Col 4
    0  2016-03  27.57   0.93  28.70   1.57
    1  2016-04  25.83   0.23  28.34   0.84
    2  2016-05  24.55   0.27  27.11   0.03
    
    df1['Date'] = pd.to_datetime(df1['Date'])
    print (df1)
            Date   Col1  Col 2  Col 3  Col 4
    0 2016-03-01  27.57   0.93  28.70   1.57
    1 2016-04-01  25.83   0.23  28.34   0.84
    2 2016-05-01  24.55   0.27  27.11   0.03
        
    

    np.random.seed(20)
    df2 = pd.DataFrame({'Date': pd.date_range('2016-02-20', periods=20),
                        'ColA': np.random.randint(5, size=20)})
        
    print (df2)
             Date  ColA
    0  2016-02-20     3
    1  2016-02-21     2
    2  2016-02-22     4
    3  2016-02-23     2
    4  2016-02-24     1
    5  2016-02-25     4
    6  2016-02-26     3
    7  2016-02-27     2
    8  2016-02-28     0
    9  2016-02-29     0
    10 2016-03-01     2
    11 2016-03-02     2
    12 2016-03-03     3
    13 2016-03-04     3
    14 2016-03-05     0
    15 2016-03-06     0
    16 2016-03-07     1
    17 2016-03-08     2
    18 2016-03-09     2
    19 2016-03-10     3
    

    #default direction='backward'
    df = pd.merge_asof(df2, df1, on='Date', tolerance=pd.Timedelta(7, 'days'))
    print (df)
             Date  ColA   Col1  Col 2  Col 3  Col 4
    0  2016-02-20     3    NaN    NaN    NaN    NaN
    1  2016-02-21     2    NaN    NaN    NaN    NaN
    2  2016-02-22     4    NaN    NaN    NaN    NaN
    3  2016-02-23     2    NaN    NaN    NaN    NaN
    4  2016-02-24     1    NaN    NaN    NaN    NaN
    5  2016-02-25     4    NaN    NaN    NaN    NaN
    6  2016-02-26     3    NaN    NaN    NaN    NaN
    7  2016-02-27     2    NaN    NaN    NaN    NaN
    8  2016-02-28     0    NaN    NaN    NaN    NaN
    9  2016-02-29     0    NaN    NaN    NaN    NaN
    10 2016-03-01     2  27.57   0.93   28.7   1.57
    11 2016-03-02     2  27.57   0.93   28.7   1.57
    12 2016-03-03     3  27.57   0.93   28.7   1.57
    13 2016-03-04     3  27.57   0.93   28.7   1.57
    14 2016-03-05     0  27.57   0.93   28.7   1.57
    15 2016-03-06     0  27.57   0.93   28.7   1.57
    16 2016-03-07     1  27.57   0.93   28.7   1.57
    17 2016-03-08     2  27.57   0.93   28.7   1.57
    18 2016-03-09     2    NaN    NaN    NaN    NaN
    19 2016-03-10     3    NaN    NaN    NaN    NaN
    

    df = pd.merge_asof(df2, df1, on='Date', 
                       tolerance=pd.Timedelta(7, 'days'), direction='forward')
    print (df)
             Date  ColA   Col1  Col 2  Col 3  Col 4
    0  2016-02-20     3    NaN    NaN    NaN    NaN
    1  2016-02-21     2    NaN    NaN    NaN    NaN
    2  2016-02-22     4    NaN    NaN    NaN    NaN
    3  2016-02-23     2  27.57   0.93   28.7   1.57
    4  2016-02-24     1  27.57   0.93   28.7   1.57
    5  2016-02-25     4  27.57   0.93   28.7   1.57
    6  2016-02-26     3  27.57   0.93   28.7   1.57
    7  2016-02-27     2  27.57   0.93   28.7   1.57
    8  2016-02-28     0  27.57   0.93   28.7   1.57
    9  2016-02-29     0  27.57   0.93   28.7   1.57
    10 2016-03-01     2  27.57   0.93   28.7   1.57
    11 2016-03-02     2    NaN    NaN    NaN    NaN
    12 2016-03-03     3    NaN    NaN    NaN    NaN
    13 2016-03-04     3    NaN    NaN    NaN    NaN
    14 2016-03-05     0    NaN    NaN    NaN    NaN
    15 2016-03-06     0    NaN    NaN    NaN    NaN
    16 2016-03-07     1    NaN    NaN    NaN    NaN
    17 2016-03-08     2    NaN    NaN    NaN    NaN
    18 2016-03-09     2    NaN    NaN    NaN    NaN
    19 2016-03-10     3    NaN    NaN    NaN    NaN
    

    df = pd.merge_asof(df2, df1, on='Date', 
                       tolerance=pd.Timedelta(7, 'days'), direction='nearest')
    print (df)
             Date  ColA   Col1  Col 2  Col 3  Col 4
    0  2016-02-20     3    NaN    NaN    NaN    NaN
    1  2016-02-21     2    NaN    NaN    NaN    NaN
    2  2016-02-22     4    NaN    NaN    NaN    NaN
    3  2016-02-23     2  27.57   0.93   28.7   1.57
    4  2016-02-24     1  27.57   0.93   28.7   1.57
    5  2016-02-25     4  27.57   0.93   28.7   1.57
    6  2016-02-26     3  27.57   0.93   28.7   1.57
    7  2016-02-27     2  27.57   0.93   28.7   1.57
    8  2016-02-28     0  27.57   0.93   28.7   1.57
    9  2016-02-29     0  27.57   0.93   28.7   1.57
    10 2016-03-01     2  27.57   0.93   28.7   1.57
    11 2016-03-02     2  27.57   0.93   28.7   1.57
    12 2016-03-03     3  27.57   0.93   28.7   1.57
    13 2016-03-04     3  27.57   0.93   28.7   1.57
    14 2016-03-05     0  27.57   0.93   28.7   1.57
    15 2016-03-06     0  27.57   0.93   28.7   1.57
    16 2016-03-07     1  27.57   0.93   28.7   1.57
    17 2016-03-08     2  27.57   0.93   28.7   1.57
    18 2016-03-09     2    NaN    NaN    NaN    NaN
    19 2016-03-10     3    NaN    NaN    NaN    NaN