I have 2 dataframes I wish to merge:
df1 looks like this:
Date Col1 Col 2 Col 3 Col 4
2016-03 27.57 0.93 28.7 1.57
2016-04 25.83 0.23 28.34 0.84
2016-05 24.55 0.27 27.11 0.03
df2 looks like this:
Date ColA
2016-03-21 7.640769230769231
2016-03-22 7.739720279720279
2016-03-23 7.577311827956988
2016-03-24 7.745416666666666
As you can see, df1 is a monthly data and df2 is a daily data. However, I want to merge them in a daily format (following df2) but I also want df1 to be lagged (lag = -30)
This is my desired output:
Output:
Date ColA Col1 Col 2 Col 3 Col 4
2016-03-21 7.640769230769231 25.83 0.23 28.34 0.84
2016-03-22 7.739720279720279 25.83 0.23 28.34 0.84
2016-03-23 7.577311827956988 25.83 0.23 28.34 0.84
2016-03-24 7.745416666666666 25.83 0.23 28.34 0.84
....2016-04-01 xxxxxxxx 24.55 0.27 27.11 0.03
I tried this but, they just merge and the lags were not applied.
out = (df2.merge(df1.shift(-30), on='Date').axis=1)
EDIT: Since I can't make the below suggestions work on my specific problem, what I did is this:
out['Col1']= out['Col1'].shift(7).dropna()
out = pd.merge_asof(df1, df2, on = 'Date')
This is to lag only 1 column (which I decided to do since most columns have possibility to have different lag times.
For merge use merge_asof
:
print (df1)
Date Col1 Col 2 Col 3 Col 4
0 2016-03 27.57 0.93 28.70 1.57
1 2016-04 25.83 0.23 28.34 0.84
2 2016-05 24.55 0.27 27.11 0.03
df1['Date'] = pd.to_datetime(df1['Date'])
print (df1)
Date Col1 Col 2 Col 3 Col 4
0 2016-03-01 27.57 0.93 28.70 1.57
1 2016-04-01 25.83 0.23 28.34 0.84
2 2016-05-01 24.55 0.27 27.11 0.03
np.random.seed(20)
df2 = pd.DataFrame({'Date': pd.date_range('2016-02-20', periods=20),
'ColA': np.random.randint(5, size=20)})
print (df2)
Date ColA
0 2016-02-20 3
1 2016-02-21 2
2 2016-02-22 4
3 2016-02-23 2
4 2016-02-24 1
5 2016-02-25 4
6 2016-02-26 3
7 2016-02-27 2
8 2016-02-28 0
9 2016-02-29 0
10 2016-03-01 2
11 2016-03-02 2
12 2016-03-03 3
13 2016-03-04 3
14 2016-03-05 0
15 2016-03-06 0
16 2016-03-07 1
17 2016-03-08 2
18 2016-03-09 2
19 2016-03-10 3
#default direction='backward'
df = pd.merge_asof(df2, df1, on='Date', tolerance=pd.Timedelta(7, 'days'))
print (df)
Date ColA Col1 Col 2 Col 3 Col 4
0 2016-02-20 3 NaN NaN NaN NaN
1 2016-02-21 2 NaN NaN NaN NaN
2 2016-02-22 4 NaN NaN NaN NaN
3 2016-02-23 2 NaN NaN NaN NaN
4 2016-02-24 1 NaN NaN NaN NaN
5 2016-02-25 4 NaN NaN NaN NaN
6 2016-02-26 3 NaN NaN NaN NaN
7 2016-02-27 2 NaN NaN NaN NaN
8 2016-02-28 0 NaN NaN NaN NaN
9 2016-02-29 0 NaN NaN NaN NaN
10 2016-03-01 2 27.57 0.93 28.7 1.57
11 2016-03-02 2 27.57 0.93 28.7 1.57
12 2016-03-03 3 27.57 0.93 28.7 1.57
13 2016-03-04 3 27.57 0.93 28.7 1.57
14 2016-03-05 0 27.57 0.93 28.7 1.57
15 2016-03-06 0 27.57 0.93 28.7 1.57
16 2016-03-07 1 27.57 0.93 28.7 1.57
17 2016-03-08 2 27.57 0.93 28.7 1.57
18 2016-03-09 2 NaN NaN NaN NaN
19 2016-03-10 3 NaN NaN NaN NaN
df = pd.merge_asof(df2, df1, on='Date',
tolerance=pd.Timedelta(7, 'days'), direction='forward')
print (df)
Date ColA Col1 Col 2 Col 3 Col 4
0 2016-02-20 3 NaN NaN NaN NaN
1 2016-02-21 2 NaN NaN NaN NaN
2 2016-02-22 4 NaN NaN NaN NaN
3 2016-02-23 2 27.57 0.93 28.7 1.57
4 2016-02-24 1 27.57 0.93 28.7 1.57
5 2016-02-25 4 27.57 0.93 28.7 1.57
6 2016-02-26 3 27.57 0.93 28.7 1.57
7 2016-02-27 2 27.57 0.93 28.7 1.57
8 2016-02-28 0 27.57 0.93 28.7 1.57
9 2016-02-29 0 27.57 0.93 28.7 1.57
10 2016-03-01 2 27.57 0.93 28.7 1.57
11 2016-03-02 2 NaN NaN NaN NaN
12 2016-03-03 3 NaN NaN NaN NaN
13 2016-03-04 3 NaN NaN NaN NaN
14 2016-03-05 0 NaN NaN NaN NaN
15 2016-03-06 0 NaN NaN NaN NaN
16 2016-03-07 1 NaN NaN NaN NaN
17 2016-03-08 2 NaN NaN NaN NaN
18 2016-03-09 2 NaN NaN NaN NaN
19 2016-03-10 3 NaN NaN NaN NaN
df = pd.merge_asof(df2, df1, on='Date',
tolerance=pd.Timedelta(7, 'days'), direction='nearest')
print (df)
Date ColA Col1 Col 2 Col 3 Col 4
0 2016-02-20 3 NaN NaN NaN NaN
1 2016-02-21 2 NaN NaN NaN NaN
2 2016-02-22 4 NaN NaN NaN NaN
3 2016-02-23 2 27.57 0.93 28.7 1.57
4 2016-02-24 1 27.57 0.93 28.7 1.57
5 2016-02-25 4 27.57 0.93 28.7 1.57
6 2016-02-26 3 27.57 0.93 28.7 1.57
7 2016-02-27 2 27.57 0.93 28.7 1.57
8 2016-02-28 0 27.57 0.93 28.7 1.57
9 2016-02-29 0 27.57 0.93 28.7 1.57
10 2016-03-01 2 27.57 0.93 28.7 1.57
11 2016-03-02 2 27.57 0.93 28.7 1.57
12 2016-03-03 3 27.57 0.93 28.7 1.57
13 2016-03-04 3 27.57 0.93 28.7 1.57
14 2016-03-05 0 27.57 0.93 28.7 1.57
15 2016-03-06 0 27.57 0.93 28.7 1.57
16 2016-03-07 1 27.57 0.93 28.7 1.57
17 2016-03-08 2 27.57 0.93 28.7 1.57
18 2016-03-09 2 NaN NaN NaN NaN
19 2016-03-10 3 NaN NaN NaN NaN