Search code examples
pythonpandasdataframeconditional-statementsshift

Conditional Shift in Pandas [calculate last win date]


Help with conditional shifting in Pandas

I have the following dataframe:

rng = pd.date_range('2015-02-24', periods=10, freq='D')
win_list = ['Won', 'Lost', 'Won', 'Won', 'Lost', 'Won', 'Lost', 'Lost', 'Lost', 'Won']
cust_list = ['A', 'A', 'B', 'C', 'C', 'C', 'B', 'A', 'A', 'B']
output = ['2015-02-24', '2015-02-24', '2015-02-26', '2015-02-27', '2015-02-27', 
          '2015-03-01', '2015-02-26', '2015-02-24', '2015-02-24', '2015-03-05']
df = pd.DataFrame({'TodayDate': rng, 'Status': win_list, 'Customer': cust_list, 'Last Win Datetime' : output})
df

which looks like

TodayDate   Status  Customer    Last Win Datetime
2015-02-24  Won     A           2015-02-24
2015-02-25  Lost    A           2015-02-24
2015-02-26  Won     B           2015-02-26
2015-02-27  Won     C           2015-02-27
2015-02-28  Lost    C           2015-02-27
2015-03-01  Won     C           2015-03-01
2015-03-02  Lost    B           2015-02-26
2015-03-03  Lost    A           2015-02-24
2015-03-04  Lost    A           2015-02-24
2015-03-05  Won     B           2015-03-05

I am trying to compute the 'Last Win Datetime' column. This column should return the most recent TodayDate when a Customer had a Won status.

I have tried the solution posted Pandas and Python Dataframes and Conditional Shift Function but this seems to return the most recent date for a matching Status (or car type from the example).

Any help is much appreciated!


Solution

  • Use Series.where for replace rows with no Won to missing values, so possible use GroupBy.ffill:

    df['Last Win Datetime'] = (df['TodayDate'].where(df['Status'].eq('Won'))
                                              .groupby(df['Customer'])
                                              .ffill())
    print(df)
       TodayDate Status Customer Last Win Datetime
    0 2015-02-24    Won        A        2015-02-24
    1 2015-02-25   Lost        A        2015-02-24
    2 2015-02-26    Won        B        2015-02-26
    3 2015-02-27    Won        C        2015-02-27
    4 2015-02-28   Lost        C        2015-02-27
    5 2015-03-01    Won        C        2015-03-01
    6 2015-03-02   Lost        B        2015-02-26
    7 2015-03-03   Lost        A        2015-02-24
    8 2015-03-04   Lost        A        2015-02-24
    9 2015-03-05    Won        B        2015-03-05
    

    Details:

    print(df['TodayDate'].where(df['Status'].eq('Won')))
    0   2015-02-24
    1          NaT
    2   2015-02-26
    3   2015-02-27
    4          NaT
    5   2015-03-01
    6          NaT
    7          NaT
    8          NaT
    9   2015-03-05
    Name: TodayDate, dtype: datetime64[ns]