Search code examples
pythonpandasnumpydata-munging

change selected columns based on regex to percentage


I have the below data:

Shiver  Shiver - H2O -Multi Shiver-SHOWME -Multi    Shiver - STARWARS - Multi   Shiver - UFC - Multi    Total   no_of_prize
3.30    46.94               14.83                   10.76                       0.00                    575.30  3
4.01    21.31               21.28                   10.77                       0.00                    568.16  3
1.24    0.15                0.46                    0.00                        0.00                    482.74  2
0.90    0.43                0.94                    0.00                        0.00                    128.38  2
2.91    2.93                0.75                    0.86                        0.00                    265.52  3
0.31    8.09                4.70                    0.72                        0.00                    117.81  3
0.25    0.83                2.12                    0.00                        0.00                    55.84   2
0.04    2.43                1.39                    1.64                        0.00                    457.23  3
5.95    12.28               4.33                    5.73                        0.00                    540.54  3

For each of the columns (except Total and no_of_prize) are to be changed to percentage of Total. Now the number of columns might vary from data to data and hence I want to use regex to identify columns like Shiver and Shiver - Multi. For example another data could be

Horror Horror - FRANKESTIEN - Multi Total no_of_prize

I want to handle this automatically by using regex or in any other way feasible.

Expected output:

Shiver  Shiver - H2O -Multi Shiver-SHOWME -Multi    Shiver - STARWARS - Multi   Shiver - UFC - Multi    Total   no_of_prem
0.57%   8.16%               2.58%                   1.87%                       0.00%                   575.30  3
0.71%   3.75%               3.75%                   1.89%                       0.00%                   568.16  3
0.26%   0.03%               0.09%                   0.00%                       0.00%                   482.74  2
0.71%   0.34%               0.73%                   0.00%                       0.00%                   128.38  2
1.10%   1.10%               0.28%                   0.33%                       0.00%                   265.52  3
0.27%   6.87%               3.99%                   0.61%                       0.00%                   117.81  3
0.46%   1.48%               3.80%                   0.00%                       0.00%                   55.84   2
0.01%   0.53%               0.30%                   0.36%                       0.00%                   457.23  3
1.10%   2.27%               0.80%                   1.06%                       0.00%                   540.54  3

Can someone please help me with this?


Solution

  • Filter out unnecessary columns by difference first, then divide by column Total by div and assign back:

    cols = df.columns.difference(['Total','no_of_prize'])
    print (cols)
    Index(['Shiver', 'Shiver - H2O -Multi', 'Shiver - STARWARS - Multi',
           'Shiver - UFC - Multi', 'Shiver-SHOWME -Multi'],
          dtype='object')
    
    df[cols] = df[cols].div(df['Total'], axis=0).mul(100)
    

    Alternative solution is use boolean mask with isin and invert by ~ - need all columns without Total and no_of_prize:

    col_mask = ~df.columns.isin(['Total','no_of_prize'])
    print (col_mask)
    [ True  True  True  True  True False False]
    
    df.loc[:, col_mask] = df.loc[:, col_mask].div(df['Total'], axis=0).mul(100)
    

    print (df)
         Shiver  Shiver - H2O -Multi  Shiver-SHOWME -Multi  \
    0  0.573614             8.159221              2.577786   
    1  0.705787             3.750704              3.745424   
    2  0.256867             0.031073              0.095289   
    3  0.701044             0.334943              0.732201   
    4  1.095963             1.103495              0.282465   
    5  0.263136             6.866989              3.989475   
    6  0.447708             1.486390              3.796562   
    7  0.008748             0.531461              0.304005   
    8  1.100751             2.271802              0.801051   
    
       Shiver - STARWARS - Multi  Shiver - UFC - Multi   Total  no_of_prize  
    0                   1.870329                   0.0  575.30            3  
    1                   1.895593                   0.0  568.16            3  
    2                   0.000000                   0.0  482.74            2  
    3                   0.000000                   0.0  128.38            2  
    4                   0.323893                   0.0  265.52            3  
    5                   0.611154                   0.0  117.81            3  
    6                   0.000000                   0.0   55.84            2  
    7                   0.358682                   0.0  457.23            3  
    8                   1.060051                   0.0  540.54            3 
    

    For convert numeric columns to strings are 2 ways:

    If need percentage with double digit after . use format:

    cols = df.columns.difference(['Total','no_of_prize'])
    df[cols] = df[cols].div(df['Total'], axis=0).mul(100).round(2).applymap('{:,.2f}%'.format)
    print (df)
      Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi  \
    0  0.57%               8.16%                2.58%                     1.87%   
    1  0.71%               3.75%                3.75%                     1.90%   
    2  0.26%               0.03%                0.10%                     0.00%   
    3  0.70%               0.33%                0.73%                     0.00%   
    4  1.10%               1.10%                0.28%                     0.32%   
    5  0.26%               6.87%                3.99%                     0.61%   
    6  0.45%               1.49%                3.80%                     0.00%   
    7  0.01%               0.53%                0.30%                     0.36%   
    8  1.10%               2.27%                0.80%                     1.06%   
    
      Shiver - UFC - Multi   Total  no_of_prize  
    0                0.00%  575.30            3  
    1                0.00%  568.16            3  
    2                0.00%  482.74            2  
    3                0.00%  128.38            2  
    4                0.00%  265.52            3  
    5                0.00%  117.81            3  
    6                0.00%   55.84            2  
    7                0.00%  457.23            3  
    8                0.00%  540.54            3   
    

    Or if double digit are not necessary:

    df[cols] = df[cols].div(df['Total'], axis=0).mul(100).round(2).astype(str) + '%'
    print (df)
      Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi  \
    0  0.57%               8.16%                2.58%                     1.87%   
    1  0.71%               3.75%                3.75%                      1.9%   
    2  0.26%               0.03%                 0.1%                      0.0%   
    3   0.7%               0.33%                0.73%                      0.0%   
    4   1.1%                1.1%                0.28%                     0.32%   
    5  0.26%               6.87%                3.99%                     0.61%   
    6  0.45%               1.49%                 3.8%                      0.0%   
    7  0.01%               0.53%                 0.3%                     0.36%   
    8   1.1%               2.27%                 0.8%                     1.06%   
    
      Shiver - UFC - Multi   Total  no_of_prize  
    0                 0.0%  575.30            3  
    1                 0.0%  568.16            3  
    2                 0.0%  482.74            2  
    3                 0.0%  128.38            2  
    4                 0.0%  265.52            3  
    5                 0.0%  117.81            3  
    6                 0.0%   55.84            2  
    7                 0.0%  457.23            3  
    8                 0.0%  540.54            3