I want to group my dataframe according to a column [Say: Release] with a sum of values from other rows in a customized representation.
My Dataframe:
| Release | Pass | Fail | Total |
|:---------:|:----:|:----:|:-----:|
| release_a | 10 | 20 | 30 |
| release_a | 5 | 45 | 50 |
| release_a | 5 | 23 | 28 |
| release_a | 20 | 67 | 87 |
| release_a | 87 | 11 | 98 |
| release_b | 2 | 5 | 7 |
| release_b | 10 | 45 | 55 |
| release_b | 64 | 33 | 97 |
| release_c | 3 | 15 | 18 |
| release_c | 104 | 89 | 193 |
| release_c | 98 | 87 | 185 |
Output expected:
| Release | Summary |
|:---------:|:----------------------:|
| release_a | [Pass: 40, Fail: 55] |
| release_b | [Pass: 76, Fail: 83] |
| release_c | [Pass: 205, Fail: 191] |
I tried using groupby and agg, but couldnt achieve my expected result.
I used the below codes:
df.groupby(['Release Name'],as_index=False).agg({'Pass': lambda x: x.to_numpy().tolist()})
df['Release Name'].map(df.value_counts('Pass'))
You could use groupby.sum
, then agg
:
out = (df
.drop(columns=['Total'], errors='ignore')
.groupby('Release').sum()
.agg(lambda x: list(zip(x.index, x.values)), axis=1)
.reset_index(name='Summary')
)
Output:
Release Summary
0 release_a [(Pass, 127), (Fail, 166)]
1 release_b [(Pass, 76), (Fail, 83)]
2 release_c [(Pass, 205), (Fail, 191)]
As dictionary (.agg(lambda x: dict(zip(x.index, x.values)), axis=1)
):
Release Summary
0 release_a {'Pass': 127, 'Fail': 166}
1 release_b {'Pass': 76, 'Fail': 83}
2 release_c {'Pass': 205, 'Fail': 191}
Alternatively, using a single groupby.apply
:
cols = ['Pass', 'Fail']
out = (df
.groupby('Release')
.apply(lambda x: x[cols].sum().to_dict(),
include_groups=False)
.reset_index(name='Summary')
)