I have a data frame similar to this one
GRP HOST1 HOST2 HOST3 FILESIZE
0 0 srv39 srv45 srv47 203498176
1 1 srv102 srv36 srv38 452763956
2 1 srv101 srv36 srv45 453277268
3 1 srv101 srv34 srv45 448174741
4 1 srv36 srv49 srv50 452728577
5 2 srv100 srv47 srv48 454617541
6 2 srv100 srv45 srv49 454617541
7 2 srv38 srv49 srv47 454617541
Now what I would like to achieve is count all occurrences that I have across HOST1 HOST2 and HOST3 column grouped by the GRP column, like this
--
GRP HOST count
1 srv101 2
srv36 3
It would be perfect if I would be able to sum the value of the FILESIZE column.
I was trying to shape a solution using suggestions that I have found here, but I have not been able to get the count grouped by GRP.
Any suggestion about which would be the best approach to obtain the results that I need with pandas?
Use melt
for reshape anf then aggregate size
:
df = (df.melt(id_vars='GRP', value_vars=['HOST1','HOST2','HOST3'], value_name='HOST')
.groupby(['GRP', 'HOST'])
.size()
.reset_index(name='count'))
print (df)
GRP HOST count
0 0 srv39 1
1 0 srv45 1
2 0 srv47 1
3 1 srv101 2
4 1 srv102 1
5 1 srv34 1
6 1 srv36 3
7 1 srv38 1
8 1 srv45 2
9 1 srv49 1
10 1 srv50 1
11 2 srv100 2
12 2 srv38 1
13 2 srv45 1
14 2 srv47 2
15 2 srv48 1
16 2 srv49 2
If want sum
of column FILESIZE
use agg
:
df1 = (df.melt(id_vars=['GRP', 'FILESIZE'], value_vars=['HOST1','HOST2','HOST3'], value_name='HOST')
.groupby(['GRP', 'HOST'])['FILESIZE']
.agg(['size','sum'])
.reset_index()
)
print (df1)
GRP HOST size sum
0 0 srv39 1 203498176
1 0 srv45 1 203498176
2 0 srv47 1 203498176
3 1 srv101 2 901452009
4 1 srv102 1 452763956
5 1 srv34 1 448174741
6 1 srv36 3 1358769801
7 1 srv38 1 452763956
8 1 srv45 2 901452009
9 1 srv49 1 452728577
10 1 srv50 1 452728577
11 2 srv100 2 909235082
12 2 srv38 1 454617541
13 2 srv45 1 454617541
14 2 srv47 2 909235082
15 2 srv48 1 454617541
16 2 srv49 2 909235082