Search code examples
pythonpandasdataframerowcumulative-sum

Python Pandas, Running Sum, based on previous rows value and grouped


I have a pandas dataframe along these lines, based on where a customer service case sits before being closed. Every time the case is edited and audit trial is captured. I want to generate a counter for each time the Department of a case changes from the department it was previously in.

ID Department Start Date End Date
A Sales 01/01/2022 02/01/2022
A Sales 02/01/2022 03/01/2022
A Operations 03/01/2022 04/01/2022
A Sales 04/01/2022 05/01/2022
B Finance 01/01/2022 02/01/2022
B Risk 02/01/2022 03/01/2022

The output I want to achieve is shown below, the part I am struggling with is getting the 'Count of Department Change' value to increase when the ticket returns to a department it has already been in.

ID Department Start Date End Date Count of Department Change
A Sales 01/01/2022 02/01/2022 0
A Sales 02/01/2022 03/01/2022 0
A Operations 03/01/2022 04/01/2022 1
A Sales 04/01/2022 05/01/2022 2
B Finance 01/01/2022 02/01/2022 0
B Risk 02/01/2022 03/01/2022 1

Using the following code I am able to flag when the department changes for a given case.

df['Dept_Change_Count'] = np.where((df['Department'] != df['Department'].shift(1)) & (df['ID'] == df['ID'].shift(1)), '1', '0')

I'm thinking I could use the df['Dept_Change_Count'] and a running sum along the ID to generate the output I'm after but I haven't had much luck so far.

Any help greatly appreciated!


Solution

  • Compare previous and current row in Department per ID then again group by ID and calculate cumsum to generate counter

    m = df['Department'] != df.groupby('ID')['Department'].shift()
    df['Dept_Change_Count'] = m.groupby(df['ID']).cumsum() - 1
    

    Alternative approach using a single groupby with lambda func to calculate cumsum:

    df['Dept_Change_Count'] = df.groupby('ID')['Department']\
                                .apply(lambda s: (s != s.shift()).cumsum()) - 1
    

      ID  Department  Start Date    End Date  Dept_Change_Count
    0  A       Sales  01/01/2022  02/01/2022                  0
    1  A       Sales  02/01/2022  03/01/2022                  0
    2  A  Operations  03/01/2022  04/01/2022                  1
    3  A       Sales  04/01/2022  05/01/2022                  2
    4  B     Finance  01/01/2022  02/01/2022                  0
    5  B        Risk  02/01/2022  03/01/2022                  1