Search code examples
pythonpandasdataframefillna

Lead fill function in pandas using condition from another column


I have a dataset containing a date index in the form MMMM-YY, the start date of a promotion, the discount value and the end date of the promotion.

As follows:

events = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
                         'promo_start': ['2022-01','Nan','2022-03','Nan','2022-05','2022-06','Nan','Nan','2022-09','Nan'],
                         'disc': ['0.1','Nan',0.2,'Nan',0.2,0.4,'Nan','Nan',0.5,'NaN'],
                         'promo_end': ['Nan', '2022-02','Nan','2022-04','2022-05','Nan','2022-07','Nan','Nan','2022-10']})

enter image description here

I have attempted various combined groupby and ffill operations but I am unable to produce the desired output.

For each week in YYYY-MM I would like to be able to assess whether the promo was active by doing something akin to a lead fill operation such that the output is a dataframe with a boolean flag and the discount amount, as follows;

desired_output = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
                         'promo_start': ['2022-01','Nan','2022-03','Nan','2022-05','2022-06','Nan','Nan','2022-09','Nan'],
                         'disc': ['0.1','Nan',0.2,'Nan',0.2,0.4,'Nan','Nan',0.5,'NaN'],
                         'promo_end': ['Nan', '2022-02','Nan','2022-04','2022-05','Nan','2022-07','Nan','Nan','2022-10'],
                         'promo_active': [True,True,True,True,True,True,True,False,True,True],
                         'promo_disc': [0.1,0.1,0.2,0.2,0.2,0.4,0.4,0,0.5,0.5]})

enter image description here


Solution

  • Forward fill the promo_start then group the promo_end per promo start and backward fill the values, now check for the nulls to determine whether promo is active

    g = events['promo_start'].ffill()
    events['promo_active'] = events['promo_end'].groupby(g).bfill().notna()
    

        yyyyww promo_start  disc promo_end  promo_active
    0  2022-01     2022-01   0.1      None          True
    1  2022-02        None  None   2022-02          True
    2  2022-03     2022-03   0.2      None          True
    3  2022-04        None  None   2022-04          True
    4  2022-05     2022-05   0.2   2022-05          True
    5  2022-06     2022-06   0.4      None          True
    6  2022-07        None  None   2022-07          True
    7  2022-08        None  None      None         False
    8  2022-09     2022-09   0.5      None          True
    9  2022-10        None   NaN   2022-10          True