I want to group below type of dataset by postalcodes and calculate the share of completed orders each shipping method has per postalcode. I've implemented a csv-file and tried the code below but i realized I need MultiIndex for that - and since I have a loot of different postalcodes I'm not sure how to go with it.
postalcode | shipping_method | completed_orders |
---|---|---|
12345 | post1 | 1 |
12345 | post2 | 3 |
12345 | post3 | 2 |
11123 | post1 | 1 |
11123 | post2 | 2 |
import numpy as np
import pandas as pd
shipping_data = pd.read_csv("shipping_per_postalcode.csv")
shareof = lambda x: x/x.sum()
result = shipping_data['amount_users_completed'].groupby(level=['postalcode', 'shipping_option']).transform(sumto)
print(result)
You may need additional groupby to get the percentage contribution
df_agg=df_1.groupby(['postalcode', 'shipping_method'])['completed_orders'].sum()
df_agg.groupby(level=0).apply(lambda x: 100*x/float(x.sum()))