Search code examples
pythonpandasdataframemulti-index

Pandas count all occurrences on different columns in a dataframe


I have a data frame similar to this one GRP HOST1 HOST2 HOST3 FILESIZE 0 0 srv39 srv45 srv47 203498176 1 1 srv102 srv36 srv38 452763956 2 1 srv101 srv36 srv45 453277268 3 1 srv101 srv34 srv45 448174741 4 1 srv36 srv49 srv50 452728577 5 2 srv100 srv47 srv48 454617541 6 2 srv100 srv45 srv49 454617541 7 2 srv38 srv49 srv47 454617541

Now what I would like to achieve is count all occurrences that I have across HOST1 HOST2 and HOST3 column grouped by the GRP column, like this

-- GRP HOST count 1 srv101 2 srv36 3 It would be perfect if I would be able to sum the value of the FILESIZE column. I was trying to shape a solution using suggestions that I have found here, but I have not been able to get the count grouped by GRP.

Any suggestion about which would be the best approach to obtain the results that I need with pandas?


Solution

  • Use melt for reshape anf then aggregate size:

    df = (df.melt(id_vars='GRP', value_vars=['HOST1','HOST2','HOST3'], value_name='HOST')
            .groupby(['GRP', 'HOST'])
            .size()
            .reset_index(name='count'))
    print (df)
        GRP    HOST  count
    0     0   srv39      1
    1     0   srv45      1
    2     0   srv47      1
    3     1  srv101      2
    4     1  srv102      1
    5     1   srv34      1
    6     1   srv36      3
    7     1   srv38      1
    8     1   srv45      2
    9     1   srv49      1
    10    1   srv50      1
    11    2  srv100      2
    12    2   srv38      1
    13    2   srv45      1
    14    2   srv47      2
    15    2   srv48      1
    16    2   srv49      2
    

    If want sum of column FILESIZE use agg:

    df1 = (df.melt(id_vars=['GRP', 'FILESIZE'], value_vars=['HOST1','HOST2','HOST3'], value_name='HOST')
            .groupby(['GRP', 'HOST'])['FILESIZE']
            .agg(['size','sum'])
            .reset_index()
            )
    print (df1)
        GRP    HOST  size         sum
    0     0   srv39     1   203498176
    1     0   srv45     1   203498176
    2     0   srv47     1   203498176
    3     1  srv101     2   901452009
    4     1  srv102     1   452763956
    5     1   srv34     1   448174741
    6     1   srv36     3  1358769801
    7     1   srv38     1   452763956
    8     1   srv45     2   901452009
    9     1   srv49     1   452728577
    10    1   srv50     1   452728577
    11    2  srv100     2   909235082
    12    2   srv38     1   454617541
    13    2   srv45     1   454617541
    14    2   srv47     2   909235082
    15    2   srv48     1   454617541
    16    2   srv49     2   909235082