Search code examples
pythonpython-3.xpandaspandas-groupbycumsum

Pandas Groupby Based on Values in Multiple Columns


I have a dataframe that I am trying to use pandas.groupby on to get the cumulative sum. The values that I am grouping by show up in two different columns, and I am having trouble getting the groupby to work correctly. My starting dataframe is:

df = pd.DataFrame({'col_A': ['red', 'red', 'blue', 'red'], 'col_B': ['blue', 'red', 'blue', 'red'], 'col_A_qty': [1, 1, 1, 1], 'col_B_qty': [1, 1, 1, 1]})

col_A   col_B   col_A_qty   col_B_qty
red      blue      1           1
red      red       1           1
blue    blue       1           1
red      red       1           1

The result I am trying to get is:

col_A   col_B   col_A_qty   col_B_qty   red_cumsum  blue_cumsum
red     blue       1            1           1           1
red     red        1            1           3           1
blue    blue       1            1           3           3
red     red        1            1           5           3

I've tried:

df.groupby(['col_A', 'col_B'])['col_A_qty'].cumsum()

but this groups on the combination of col_A and col_B. How can I use pandas.groupby to calculate the cumulative sum of red and blue, regardless of if it's in col_A or col_B?


Solution

  • Try two pivot

    out = pd.pivot(df,columns='col_A',values='col_A_qty').fillna(0).cumsum().add(pd.pivot(df,columns='col_B',values='col_B_qty').fillna(0).cumsum(),fill_value=0)
    Out[404]: 
    col_A  blue  red
    0       1.0  1.0
    1       1.0  3.0
    2       3.0  3.0
    3       3.0  5.0
    df = df.join(out)