Search code examples
pythonpandasreport

Reporting with Pandas


I'm trying to generate reports using Pandas, grouping by a set of fields:

This is what I'm doing:

#!/usr/bin/env python3

import pandas as pd

data = [
    {
        'id': 1,
        'name': 'name1',
        'pretty_name': 'Pretty Name 1',
        'server_name': 'exampleserver.local',
        'provider': 'provider1',
        'type': 'A',
        'status': 'KO'
    },
    {
        'id': 2,
        'name': 'name2',
        'pretty_name': 'Pretty Name 2',
        'server_name': 'exampleserver1.local',
        'provider': 'provider2',
        'type': 'B',
        'status': 'OK'
    },
    {
        'id': 1,
        'name': 'name1',
        'pretty_name': 'Pretty Name 1',
        'server_name': 'exampleserver.local',
        'provider': 'provider1',
        'type': 'A',
        'status': 'KO'
    },
    {
        'id': 1,
        'name': 'name1',
        'pretty_name': 'Pretty Name 1',
        'server_name': 'exampleserver.local',
        'provider': 'provider1',
        'type': 'A',
        'status': 'OK'
    },
    {
        'id': 2,
        'name': 'name2',
        'pretty_name': 'Pretty Name 2',
        'server_name': 'exampleserver.local',
        'provider': 'provider2',
        'type': 'A',
        'status': 'OK'
    }
]

df = pd.DataFrame(data)

grouped = df.groupby(['server_name', 'provider', 'type', 'status'])['id'].count()
print(grouped.to_string())

Which returns:

server_name           provider   type  status
exampleserver.local   provider1  A     KO        2
                                       OK        1
                      provider2  A     OK        1
exampleserver1.local  provider2  B     OK        1

This is alright, but I would like to add to the result a row containing the total for each provider. I.e.

server_name           provider   tot  type  status
exampleserver.local   provider1  3    A     KO        2
                                            OK        1
                      provider2  1    A     OK        1
exampleserver1.local  provider2  1    B     OK        1

I'm pretty sure this can be done quite easily with Pandas, but I've spent hours reading documentation with no luck.

Any pointers?

Thanks.

EDIT: I've corrected and extended the example as it didn't really made sense.


Solution

  • You can create helper column for compare if match provider1 with DataFrame.assign and Series.eq, convert to integers, so you can use sum for count matched values:

    grouped = (df.assign(new=df['provider'].str.contains('provider1').astype(int))
                 .groupby(['server_name', 'provider', 'type', 'status'])['new']
                 .agg([('count','size'), ('provider1_count','sum')])
                 .reset_index())
    print (grouped)
               server_name   provider type status  count  provider1_count
    0  exampleserver.local  provider1    A     KO      1                1
    1  exampleserver.local  provider2    A     OK      1                0
    2  exampleserver.local  provider2    B     OK      1                0
    

    EDIT:

    You can add as_index=False for DataFrame and rename column:

    df1 = (df.groupby(['server_name', 'provider', 'type', 'status'], as_index=False)['id']
             .count()
             .rename(columns={'id':'counts'}))
    

    Then if want new column in position 2 use DataFrame.insert with GroupBy.transform:

    df1.insert(2, 'tot', df1.groupby(['server_name','provider'])['counts'].transform('sum'))
    print(df1)
                server_name   provider  tot type status  counts
    0   exampleserver.local  provider1    3    A     KO       2
    1   exampleserver.local  provider1    3    A     OK       1
    2   exampleserver.local  provider2    1    A     OK       1
    3  exampleserver1.local  provider2    1    B     OK       1
    

    And last if need Multiindex use DataFrame.set_index:

    grouped = df1.set_index(['server_name', 'provider', 'tot','type', 'status'])['counts']
    print (grouped)
    server_name           provider   tot  type  status
    exampleserver.local   provider1  3    A     KO        2
                                                OK        1
                          provider2  1    A     OK        1
    exampleserver1.local  provider2  1    B     OK        1
    Name: counts, dtype: int64