Search code examples
pythonpandaspandas-groupbyaggregationpercentage

Python pandas calculate share of after groupby


I want to group below type of dataset by postalcodes and calculate the share of completed orders each shipping method has per postalcode. I've implemented a csv-file and tried the code below but i realized I need MultiIndex for that - and since I have a loot of different postalcodes I'm not sure how to go with it.

postalcode shipping_method completed_orders
12345 post1 1
12345 post2 3
12345 post3 2
11123 post1 1
11123 post2 2
import numpy as np
import pandas as pd

shipping_data = pd.read_csv("shipping_per_postalcode.csv")

shareof = lambda x: x/x.sum()
result = shipping_data['amount_users_completed'].groupby(level=['postalcode', 'shipping_option']).transform(sumto)
print(result)

Solution

  • You may need additional groupby to get the percentage contribution

    df_agg=df_1.groupby(['postalcode', 'shipping_method'])['completed_orders'].sum()
    
    df_agg.groupby(level=0).apply(lambda x: 100*x/float(x.sum()))
    

    enter image description here

    Source: Pandas percentage of total with groupby