I have two dataframes df1 and df2, where df1 has a datetime index based on days, and df2 has two date columns 'wk start' and 'wk end' that are weekly ranges as well as one data column 'statistic' that stores data corresponding to the week range.
I would like to add to df1 a column for 'statistic' whereby I lookup each date (on a daily basis, i.e. each row) and try to find the corresponding 'statistic' depending on the week that this date falls into.
I believe the answer would require merging df2 into df1 but I'm lost as to how to proceed after that.
df1: (note: I skipped the rows between 2019-06-12 and 2019-06-16 to keep the example short.)
date | age |
---|---|
2019-06-10 | 20 |
2019-06-11 | 21 |
2019-06-17 | 19 |
2019-06-18 | 18 |
df2:
wk start | wk end | statistic |
---|---|---|
2019-06-10 | 2019-06-14 | 102 |
2019-06-17 | 2019-06-21 | 100 |
2019-06-24 | 2019-06-28 | 547 |
2019-07-02 | 2019-07-25 | 268 |
Desired output:
date | age | statistic |
---|---|---|
2019-06-10 | 20 | 102 |
2019-06-11 | 21 | 102 |
2019-06-17 | 19 | 100 |
2019-06-18 | 18 | 100 |
code for the dataframes d1 and d2
import pandas as pd
import datetime
data1 = {
'date': ['2019-06-10', '2019-06-11', '2019-06-17', '2019-06-18'],
'age': [20, 21, 19, 18]
}
data1['date'] = pd.to_datetime(data1['date'])
df1 = pd.DataFrame(data1)
df1.set_index('date', inplace=True)
data2 = {
'wk start': ['2019-06-10', '2019-06-17', '2019-06-24', '2019-07-02'],
'wk end': ['2019-06-14', '2019-06-21', '2019-06-28', '2019-07-05'],
'height': [120, 121, 119, 118]
}
data2['wk start'] = pd.to_datetime(data2['wk start'])
data2['wk end'] = pd.to_datetime(data2['wk end'])
df2 = pd.DataFrame(data2)
You can firstly reset_index()
on df1
to get the date
row index back to data column. Then, cross join df1
and df2
by .merge()
with how='cross'
and then filter the result by date
field is between wk start
and wk end
using .between()
, as follows;
df_merge = df1.reset_index().merge(df2, how='cross')
df_out = df_merge[df_merge['date'].between(df_merge['wk start'], df_merge['wk end'])]
Or, if your Pandas version is < 1.2.0 (released in December 2020)
df_merge = df1.reset_index().assign(key=1).merge(df2.assign(key=1), on='key').drop('key', axis=1)
df_out = df_merge[df_merge['date'].between(df_merge['wk start'], df_merge['wk end'])]
Result:
print(df_out)
date age wk start wk end height
0 2019-06-10 20 2019-06-10 2019-06-14 120
4 2019-06-11 21 2019-06-10 2019-06-14 120
9 2019-06-17 19 2019-06-17 2019-06-21 121
13 2019-06-18 18 2019-06-17 2019-06-21 121
You can further remove the 2 columns wk start
wk end
and set column date
as index by:
df_out = df_out.drop(['wk start', 'wk end'], axis=1).set_index('date')
Result:
print(df_out)
age height
date
2019-06-10 20 120
2019-06-11 21 120
2019-06-17 19 121
2019-06-18 18 121