Search code examples
pythonpandasdataframegroup-bysum

pandas dataframe groupby over consecutive duplicates and sum the values


In pandas dataframe, I'm totally confused of how to use the method of groupby() over consecutive duplicates by sum values in column

Let's say I have the following DataFrame df :

index   type    value
  0    profit    11     
  1    profit    10
  2    loss      -5
  3    profit    50
  4    profit    15
  5    loss     -30
  6    loss     -25
  7    loss     -10

what I'm looking to is:

index   type    grand
  0    profit    21  # total of 11 + 10 = 21
  1    loss      -5  # the same value as this row NOT consecutive duplicated
  2    profit    65  # total of 50 + 15 = 65
  3    loss     -65  # total of -30 -25 -10 = -65

What I tried to do:

 df['grand'] = df.groupby(df['type'].ne(df['type'].shift()).cumsum()).cumcount()

but it gives me counting the consecutive duplicated

I tried to iterate through the rows with several solutions but all were failed

Thanks so much!


Solution

  • Instead of .cumcount() use sum:

    out = (
        df.groupby(df["type"].ne(df["type"].shift()).cumsum(), as_index=False)
        .agg({"type": "first", "value": "sum"})
        .rename(columns={"value": "grand"})
    )
    
    print(out)
    

    Prints:

         type  grand
    0  profit     21
    1    loss     -5
    2  profit     65
    3    loss    -65