I already referred the posts here and here. Don't mark it as duplicate
I have a dataframe like as below
id,status,country,amount,qty
1,pass,USA,123,4500
1,pass,USA,156,3210
1,fail,UK,687,2137
1,fail,UK,456,1236
2,pass,AUS,216,324
2,pass,AUS,678,241
2,nan,ANZ,637,213
2,pass,ANZ,213,543
sf = pd.read_clipboard(sep=',')
I would like to get the percentage of values from each column as a seperate column
So, with the help of this post, I tried the below
Approach - 1 Doesn't give expected output shape
(pd.crosstab(sf['id'],[sf['status'].fillna('nan'),sf['country'].fillna('nan')],normalize=0)
.drop('nan', 1)
.mul(100)).reset_index()
Approach - 2 - Doesn't give expected output
sf_inv= sf.melt()
pd.crosstab(sf_inv.value, sf_inv.variable)
I expect my output to be like as below
You can use crosstab
with normalize='index'
on your different columns and concat
the results:
pd.concat([pd.crosstab(sf['id'], sf[c], normalize='index')
for c in ['status', 'country']], axis=1).mul(100).add_suffix('_pct')
output:
fail_pct pass_pct ANZ_pct AUS_pct UK_pct USA_pct
id
1 50.0 50.0 0.0 0.0 50.0 50.0
2 0.0 100.0 50.0 50.0 0.0 0.0
pd.concat([pd.crosstab(sf['id'], sf[c].fillna('NA'), normalize='index')
.drop(columns='NA', errors='ignore')
for c in ['status', 'country']], axis=1).mul(100).add_suffix('_pct')
output:
fail_pct pass_pct ANZ_pct AUS_pct UK_pct USA_pct
id
1 50.0 50.0 0.0 0.0 50.0 50.0
2 0.0 75.0 50.0 50.0 0.0 0.0