Search code examples
pythonpandasdataframe

Pandas groupby - group columns to a list of count of row values


I want to group my dataframe according to a column [Say: Release] with a sum of values from other rows in a customized representation.

My Dataframe:

|  Release  | Pass | Fail | Total |
|:---------:|:----:|:----:|:-----:|
| release_a | 10   | 20   | 30    |
| release_a | 5    | 45   | 50    |
| release_a | 5    | 23   | 28    |
| release_a | 20   | 67   | 87    |
| release_a | 87   | 11   | 98    |
| release_b | 2    | 5    | 7     |
| release_b | 10   | 45   | 55    |
| release_b | 64   | 33   | 97    |
| release_c | 3    | 15   | 18    |
| release_c | 104  | 89   | 193   |
| release_c | 98   | 87   | 185   |

Output expected:

|  Release  |         Summary        |
|:---------:|:----------------------:|
| release_a | [Pass: 40, Fail: 55]   |
| release_b | [Pass: 76, Fail: 83]   |
| release_c | [Pass: 205, Fail: 191] |

I tried using groupby and agg, but couldnt achieve my expected result.

I used the below codes:

df.groupby(['Release Name'],as_index=False).agg({'Pass': lambda x: x.to_numpy().tolist()})
df['Release Name'].map(df.value_counts('Pass'))

Solution

  • You could use groupby.sum, then agg:

    out = (df
       .drop(columns=['Total'], errors='ignore')
       .groupby('Release').sum()
       .agg(lambda x: list(zip(x.index, x.values)), axis=1)
       .reset_index(name='Summary')
    )
    

    Output:

         Release                     Summary
    0  release_a  [(Pass, 127), (Fail, 166)]
    1  release_b    [(Pass, 76), (Fail, 83)]
    2  release_c  [(Pass, 205), (Fail, 191)]
    

    As dictionary (.agg(lambda x: dict(zip(x.index, x.values)), axis=1)):

         Release                     Summary
    0  release_a  {'Pass': 127, 'Fail': 166}
    1  release_b    {'Pass': 76, 'Fail': 83}
    2  release_c  {'Pass': 205, 'Fail': 191}
    

    Alternatively, using a single groupby.apply:

    cols = ['Pass', 'Fail']
    
    out = (df
       .groupby('Release')
       .apply(lambda x: x[cols].sum().to_dict(),
              include_groups=False)
       .reset_index(name='Summary')
    )