I have a data set that is as follows:
ClientID Date Total sales Cumulative Sales
0 A 01-01-2000 100 300
1 A 01-02-2000 100 0
2 A 01-03-2000 100 0
3 B 01-01-2000 50 100
4 B 01-02-2000 50 0
5 C 01-01-2000 70 210
6 C 01-02-2000 70 0
7 C 01-03-2000 70 0
8 D 01-01-2000 20 40
9 D 01-02-2000 20 0
I want to figure out a way to create the Cumulative Sales
column, that will be the sum of Total sales
for the first row of each unique ClientID
, and 0
for the rest.
I figured out the solution in excel by creating a new column R for counting first distinct value and then using the IF(SUMIF)
function to calculate the sum:
=IF(COUNTIFS($F$2:$F8, $F8)=1, SUMIF($F$2:$F$1065, R8, $P$2:$P$1065), "0")
Use a groupby.transform
and mask
the non-first values based on the duplicated
IDs:
df['Cumulative Sales'] = (df.groupby('Client ID')['Total sales'].transform('sum')
.mask(df['Client ID'].duplicated(), 0)
)
Output:
Client ID Date Total sales Cumulative Sales
0 A 01-01-2000 100 300
1 A 01-02-2000 100 0
2 A 01-03-2000 100 0
3 B 01-01-2000 50 100
4 B 01-02-2000 50 0
5 C 01-01-2000 70 210
6 C 01-02-2000 70 0
7 C 01-03-2000 70 0
8 D 01-01-2000 20 40
9 D 01-02-2000 20 0