Grouping a Pandas df by two columns and aggregate over lvl 1 in group

I'm new to programming overall, and I'm struggling with some pandas df aggregation.

I'm trying to group a df by two columns "A" and "B" and then the series to display the frequency of B, over all the data, not only the group. I'm trying the below. group = df.groupby(['A', 'B']).size() ###this will show only the group frequency of B.

Let's say A is a transaction Id and B is a product. I want to know how many times each product appears when looking over all transactions, but in this structure of grouping, and keeping it into a grouped series not changing back to a df.

Thank you

Solution

You can use the pd.pivot_table to do the summary:

# Import packages
import pandas as pd, numpy as np

# Initialize a sample dataframe
df = pd.DataFrame({
    "Transacion_ID": [1, 2, 3, 4, 5, 6, 7, 8, 9],
    "Product": ["milk", "milk", "milk", "milk", "milk",
                          "bread", "bread", "bread", "bread"],
    "Region": ["Eastern", "Eastern", "Eastern", "Eastern", "Eastern",
                          "Western", "Western", "Western", "Western"]

    })

# Display the dataframe
display(df)

# Use pd.pivot_table fuhction to create the summary
table = pd.pivot_table(
                        df, 
                        values='Transacion_ID', 
                        index=['Product'],
                        aggfunc='count')


# Finally show the results               
display(table)

You can also simply use the groupby function followed by the agg function as follows:

# Groupby and aggregate
table = df.groupby(['Product']).agg({
     'Transacion_ID': 'count'
})

# Finally show the results               
display(table)