I'm new to programming overall, and I'm struggling with some pandas df aggregation.
I'm trying to group a df by two columns "A" and "B" and then the series to display the frequency of B, over all the data, not only the group. I'm trying the below. group = df.groupby(['A', 'B']).size() ###this will show only the group frequency of B.
Let's say A is a transaction Id and B is a product. I want to know how many times each product appears when looking over all transactions, but in this structure of grouping, and keeping it into a grouped series not changing back to a df.
Thank you
You can use the pd.pivot_table to do the summary:
# Import packages
import pandas as pd, numpy as np
# Initialize a sample dataframe
df = pd.DataFrame({
"Transacion_ID": [1, 2, 3, 4, 5, 6, 7, 8, 9],
"Product": ["milk", "milk", "milk", "milk", "milk",
"bread", "bread", "bread", "bread"],
"Region": ["Eastern", "Eastern", "Eastern", "Eastern", "Eastern",
"Western", "Western", "Western", "Western"]
})
# Display the dataframe
display(df)
# Use pd.pivot_table fuhction to create the summary
table = pd.pivot_table(
df,
values='Transacion_ID',
index=['Product'],
aggfunc='count')
# Finally show the results
display(table)
You can also simply use the groupby function followed by the agg function as follows:
# Groupby and aggregate
table = df.groupby(['Product']).agg({
'Transacion_ID': 'count'
})
# Finally show the results
display(table)