Below is a dataframe.
team task day in out
0 A X M 60 11
1 A X T 80 12
2 A Y M 91 31
3 A Y T 11 21
4 B X M 32 13
5 B X T 18 33
6 B Z M 28 42
7 B Z T 15 15
When groupby is performed with multiple columns, the output is as below.
df.groupby(['team', 'task','day']).sum()
in out
team task day
A X M 60 11
T 80 12
Y M 91 31
T 11 21
B X M 32 13
T 18 33
Z M 28 42
T 15 15
In the same output, I am looking to get the sum for each column level i.e. at team, task level as well. As shown below.
in out
team task day
A 242 75 <<<<<
A X 140 23 <<<<<
A X M 60 11
T 80 12
A Y 102 52 <<<<<
Y M 91 31
T 11 21
B 93 103 <<<<<
B X 50 46 <<<<<
B X M 32 13
T 18 33
B Z 43 57 <<<<
Z M 28 42
T 15 15
I get the sum for these columns when performed the grouping separately.
df.groupby(['team', 'task','day']).sum()
df.groupby(['team', 'task']).sum()
df.groupby(['team']).sum()
team task day
A X M 60 11
T 80 12
Y M 91 31
T 11 21
B X M 32 13
T 18 33
Z M 28 42
T 15 15
in out
team task
A X 140 23
Y 102 52
B X 50 46
Z 43 57
in out
team
A 242 75
B 93 103
I am looking for the combined output of sum done at each level as explained above.
This is a possible solution:
a = df.groupby(['team', 'task','day']).sum()
b = df.groupby(['team', 'task']).sum()
c = df.groupby(['team']).sum()
b['day'] = None
c[['task', 'day']] = None
a = a.reset_index()
b = b.reset_index()
c = c.reset_index()
result = pd.concat(
[a, b, c],
).set_index(
['team', 'task', 'day'],
).sort_index(
na_position="first",
)
This is the output:
>>> result
in out
team task day
A NaN NaN 242 75
X NaN 140 23
M 60 11
T 80 12
Y NaN 102 52
M 91 31
T 11 21
B NaN NaN 93 103
X NaN 50 46
M 32 13
T 18 33
Z NaN 43 57
M 28 42
T 15 15