Search code examples
pythonpandas-groupby

Grouping a Pandas df by two columns and aggregate over lvl 1 in group


I'm new to programming overall, and I'm struggling with some pandas df aggregation.

I'm trying to group a df by two columns "A" and "B" and then the series to display the frequency of B, over all the data, not only the group. I'm trying the below. group = df.groupby(['A', 'B']).size() ###this will show only the group frequency of B.

Let's say A is a transaction Id and B is a product. I want to know how many times each product appears when looking over all transactions, but in this structure of grouping, and keeping it into a grouped series not changing back to a df.

Thank you


Solution

  • You can use the pd.pivot_table to do the summary:

    # Import packages
    import pandas as pd, numpy as np
    
    # Initialize a sample dataframe
    df = pd.DataFrame({
        "Transacion_ID": [1, 2, 3, 4, 5, 6, 7, 8, 9],
        "Product": ["milk", "milk", "milk", "milk", "milk",
                              "bread", "bread", "bread", "bread"],
        "Region": ["Eastern", "Eastern", "Eastern", "Eastern", "Eastern",
                              "Western", "Western", "Western", "Western"]
    
        })
    
    # Display the dataframe
    display(df)
    
    # Use pd.pivot_table fuhction to create the summary
    table = pd.pivot_table(
                            df, 
                            values='Transacion_ID', 
                            index=['Product'],
                            aggfunc='count')
    
    
    # Finally show the results               
    display(table) 
    

    You can also simply use the groupby function followed by the agg function as follows:

    # Groupby and aggregate
    table = df.groupby(['Product']).agg({
         'Transacion_ID': 'count'
    })
    
    # Finally show the results               
    display(table)