Search code examples
pythonpandasunique

Create new column with subtotal for first instance of unique value in column, 0 otherwise


I have a data set that is as follows:

  ClientID        Date  Total sales  Cumulative Sales
0        A  01-01-2000          100               300
1        A  01-02-2000          100                 0
2        A  01-03-2000          100                 0
3        B  01-01-2000           50               100
4        B  01-02-2000           50                 0
5        C  01-01-2000           70               210
6        C  01-02-2000           70                 0
7        C  01-03-2000           70                 0
8        D  01-01-2000           20                40
9        D  01-02-2000           20                 0

I want to figure out a way to create the Cumulative Sales column, that will be the sum of Total sales for the first row of each unique ClientID, and 0 for the rest.

I figured out the solution in excel by creating a new column R for counting first distinct value and then using the IF(SUMIF) function to calculate the sum:

=IF(COUNTIFS($F$2:$F8, $F8)=1, SUMIF($F$2:$F$1065, R8, $P$2:$P$1065), "0")

Solution

  • Use a groupby.transform and mask the non-first values based on the duplicated IDs:

    df['Cumulative Sales'] = (df.groupby('Client ID')['Total sales'].transform('sum')
                                .mask(df['Client ID'].duplicated(), 0)
                             )
    

    Output:

      Client ID        Date  Total sales  Cumulative Sales
    0         A  01-01-2000          100               300
    1         A  01-02-2000          100                 0
    2         A  01-03-2000          100                 0
    3         B  01-01-2000           50               100
    4         B  01-02-2000           50                 0
    5         C  01-01-2000           70               210
    6         C  01-02-2000           70                 0
    7         C  01-03-2000           70                 0
    8         D  01-01-2000           20                40
    9         D  01-02-2000           20                 0