I have a dataframe
that I am trying to use pandas.groupby
on to get the cumulative sum. The values that I am grouping by show up in two different columns, and I am having trouble getting the groupby to work correctly. My starting dataframe
is:
df = pd.DataFrame({'col_A': ['red', 'red', 'blue', 'red'], 'col_B': ['blue', 'red', 'blue', 'red'], 'col_A_qty': [1, 1, 1, 1], 'col_B_qty': [1, 1, 1, 1]})
col_A col_B col_A_qty col_B_qty
red blue 1 1
red red 1 1
blue blue 1 1
red red 1 1
The result I am trying to get is:
col_A col_B col_A_qty col_B_qty red_cumsum blue_cumsum
red blue 1 1 1 1
red red 1 1 3 1
blue blue 1 1 3 3
red red 1 1 5 3
I've tried:
df.groupby(['col_A', 'col_B'])['col_A_qty'].cumsum()
but this groups on the combination of col_A
and col_B
. How can I use pandas.groupby
to calculate the cumulative sum of red and blue, regardless of if it's in col_A
or col_B
?
Try two pivot
out = pd.pivot(df,columns='col_A',values='col_A_qty').fillna(0).cumsum().add(pd.pivot(df,columns='col_B',values='col_B_qty').fillna(0).cumsum(),fill_value=0)
Out[404]:
col_A blue red
0 1.0 1.0
1 1.0 3.0
2 3.0 3.0
3 3.0 5.0
df = df.join(out)