I have a dataset containing a date index in the form MMMM-YY, the start date of a promotion, the discount value and the end date of the promotion.
As follows:
events = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
'promo_start': ['2022-01','Nan','2022-03','Nan','2022-05','2022-06','Nan','Nan','2022-09','Nan'],
'disc': ['0.1','Nan',0.2,'Nan',0.2,0.4,'Nan','Nan',0.5,'NaN'],
'promo_end': ['Nan', '2022-02','Nan','2022-04','2022-05','Nan','2022-07','Nan','Nan','2022-10']})
I have attempted various combined groupby
and ffill
operations but I am unable to produce the desired output.
For each week in YYYY-MM
I would like to be able to assess whether the promo was active by doing something akin to a lead fill operation such that the output is a dataframe with a boolean flag and the discount amount, as follows;
desired_output = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
'promo_start': ['2022-01','Nan','2022-03','Nan','2022-05','2022-06','Nan','Nan','2022-09','Nan'],
'disc': ['0.1','Nan',0.2,'Nan',0.2,0.4,'Nan','Nan',0.5,'NaN'],
'promo_end': ['Nan', '2022-02','Nan','2022-04','2022-05','Nan','2022-07','Nan','Nan','2022-10'],
'promo_active': [True,True,True,True,True,True,True,False,True,True],
'promo_disc': [0.1,0.1,0.2,0.2,0.2,0.4,0.4,0,0.5,0.5]})
Forward fill the promo_start
then group the promo_end
per promo start and backward fill the values, now check for the nulls to determine whether promo is active
g = events['promo_start'].ffill()
events['promo_active'] = events['promo_end'].groupby(g).bfill().notna()
yyyyww promo_start disc promo_end promo_active
0 2022-01 2022-01 0.1 None True
1 2022-02 None None 2022-02 True
2 2022-03 2022-03 0.2 None True
3 2022-04 None None 2022-04 True
4 2022-05 2022-05 0.2 2022-05 True
5 2022-06 2022-06 0.4 None True
6 2022-07 None None 2022-07 True
7 2022-08 None None None False
8 2022-09 2022-09 0.5 None True
9 2022-10 None NaN 2022-10 True