I have the below data:
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi Shiver - UFC - Multi Total no_of_prize
3.30 46.94 14.83 10.76 0.00 575.30 3
4.01 21.31 21.28 10.77 0.00 568.16 3
1.24 0.15 0.46 0.00 0.00 482.74 2
0.90 0.43 0.94 0.00 0.00 128.38 2
2.91 2.93 0.75 0.86 0.00 265.52 3
0.31 8.09 4.70 0.72 0.00 117.81 3
0.25 0.83 2.12 0.00 0.00 55.84 2
0.04 2.43 1.39 1.64 0.00 457.23 3
5.95 12.28 4.33 5.73 0.00 540.54 3
For each of the columns (except Total and no_of_prize) are to be changed to percentage of Total. Now the number of columns might vary from data to data and hence I want to use regex to identify columns like Shiver and Shiver - Multi. For example another data could be
Horror Horror - FRANKESTIEN - Multi Total no_of_prize
I want to handle this automatically by using regex or in any other way feasible.
Expected output:
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi Shiver - UFC - Multi Total no_of_prem
0.57% 8.16% 2.58% 1.87% 0.00% 575.30 3
0.71% 3.75% 3.75% 1.89% 0.00% 568.16 3
0.26% 0.03% 0.09% 0.00% 0.00% 482.74 2
0.71% 0.34% 0.73% 0.00% 0.00% 128.38 2
1.10% 1.10% 0.28% 0.33% 0.00% 265.52 3
0.27% 6.87% 3.99% 0.61% 0.00% 117.81 3
0.46% 1.48% 3.80% 0.00% 0.00% 55.84 2
0.01% 0.53% 0.30% 0.36% 0.00% 457.23 3
1.10% 2.27% 0.80% 1.06% 0.00% 540.54 3
Can someone please help me with this?
Filter out unnecessary columns by difference
first, then divide by column Total
by div
and assign back:
cols = df.columns.difference(['Total','no_of_prize'])
print (cols)
Index(['Shiver', 'Shiver - H2O -Multi', 'Shiver - STARWARS - Multi',
'Shiver - UFC - Multi', 'Shiver-SHOWME -Multi'],
dtype='object')
df[cols] = df[cols].div(df['Total'], axis=0).mul(100)
Alternative solution is use boolean mask
with isin
and invert by ~
- need all columns without Total
and no_of_prize
:
col_mask = ~df.columns.isin(['Total','no_of_prize'])
print (col_mask)
[ True True True True True False False]
df.loc[:, col_mask] = df.loc[:, col_mask].div(df['Total'], axis=0).mul(100)
print (df)
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi \
0 0.573614 8.159221 2.577786
1 0.705787 3.750704 3.745424
2 0.256867 0.031073 0.095289
3 0.701044 0.334943 0.732201
4 1.095963 1.103495 0.282465
5 0.263136 6.866989 3.989475
6 0.447708 1.486390 3.796562
7 0.008748 0.531461 0.304005
8 1.100751 2.271802 0.801051
Shiver - STARWARS - Multi Shiver - UFC - Multi Total no_of_prize
0 1.870329 0.0 575.30 3
1 1.895593 0.0 568.16 3
2 0.000000 0.0 482.74 2
3 0.000000 0.0 128.38 2
4 0.323893 0.0 265.52 3
5 0.611154 0.0 117.81 3
6 0.000000 0.0 55.84 2
7 0.358682 0.0 457.23 3
8 1.060051 0.0 540.54 3
For convert numeric columns to strings are 2 ways:
If need percentage with double digit after .
use format
:
cols = df.columns.difference(['Total','no_of_prize'])
df[cols] = df[cols].div(df['Total'], axis=0).mul(100).round(2).applymap('{:,.2f}%'.format)
print (df)
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi \
0 0.57% 8.16% 2.58% 1.87%
1 0.71% 3.75% 3.75% 1.90%
2 0.26% 0.03% 0.10% 0.00%
3 0.70% 0.33% 0.73% 0.00%
4 1.10% 1.10% 0.28% 0.32%
5 0.26% 6.87% 3.99% 0.61%
6 0.45% 1.49% 3.80% 0.00%
7 0.01% 0.53% 0.30% 0.36%
8 1.10% 2.27% 0.80% 1.06%
Shiver - UFC - Multi Total no_of_prize
0 0.00% 575.30 3
1 0.00% 568.16 3
2 0.00% 482.74 2
3 0.00% 128.38 2
4 0.00% 265.52 3
5 0.00% 117.81 3
6 0.00% 55.84 2
7 0.00% 457.23 3
8 0.00% 540.54 3
Or if double digit are not necessary:
df[cols] = df[cols].div(df['Total'], axis=0).mul(100).round(2).astype(str) + '%'
print (df)
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi \
0 0.57% 8.16% 2.58% 1.87%
1 0.71% 3.75% 3.75% 1.9%
2 0.26% 0.03% 0.1% 0.0%
3 0.7% 0.33% 0.73% 0.0%
4 1.1% 1.1% 0.28% 0.32%
5 0.26% 6.87% 3.99% 0.61%
6 0.45% 1.49% 3.8% 0.0%
7 0.01% 0.53% 0.3% 0.36%
8 1.1% 2.27% 0.8% 1.06%
Shiver - UFC - Multi Total no_of_prize
0 0.0% 575.30 3
1 0.0% 568.16 3
2 0.0% 482.74 2
3 0.0% 128.38 2
4 0.0% 265.52 3
5 0.0% 117.81 3
6 0.0% 55.84 2
7 0.0% 457.23 3
8 0.0% 540.54 3