Search code examples
pythonpandasdataframesortingpercentage

Calculate cell percentage of row total and sort largest to smallest with some description using column name


I have a df like the one below:

data1 = [['A', 10, 30, 15, 45], 
        ['B', 22, 35, 25, 18], 
        ['C', 40, 27, 28, 5]]
       
df1 = pd.DataFrame(data1, columns= ['Units', 'Normal', 'Bad', 'Good', 'Neutral'])
df1

I want to get an additional column which calculates and shows each cell's percentage of the row total. But here is the trick, I want these percentages sorted from highest to lowest but also containing a string description of these percentages using the column name. What I want to achieve is shown in the df2 below under Proportion by largest to smallest column. My dataset is quite large and I would like to do this automatically.

data2 = [['A', 10, 30, 15, 45, 'Neutral  = 0.45%, Bad = 0.3%, Good = 0.15%, Normal = 0.10%'], 
            ['B', 22, 35, 25, 18, 'Bad = 0.35%, Good = 0.25%, Normal = 0.22%, Neutral = 0.18%'], 
            ['C', 40, 27, 28, 5, 'Normal = 0.4%, Good = 0.28%, Bad = 0.27%, Neutral = 0.05']]
           
df2 = pd.DataFrame(data2, columns= ['Units', 'Normal', 'Bad', 'Good', 'Neutral', 'Proportion by largest to smallest'])
df2

Solution

  • One rare case where I would recommend a custom function and apply:

    cols = df1.columns.difference(['Units'])
    
    def cust_fmt(s):
        order = s.sort_values(ascending=False).index
        return ', '.join(
               (s.index
                + ' = ' + s.div(100).round(2).astype(str) + '%'
               ).loc[order]
               )
    
    df1['Proportion'] = df1[cols].apply(cust_fmt, axis=1)
    

    output:

      Units  Normal  Bad  Good  Neutral                                                  Proportion
    0     A      10   30    15       45    Neutral = 0.45%, Bad = 0.3%, Good = 0.15%, Normal = 0.1%
    1     B      22   35    25       18  Bad = 0.35%, Good = 0.25%, Normal = 0.22%, Neutral = 0.18%
    2     C      40   27    28        5   Normal = 0.4%, Good = 0.28%, Bad = 0.27%, Neutral = 0.05%
    
    alternative with normalization:
    cols = df1.columns.difference(['Units'])
    
    def cust_fmt(s):
        order = s.sort_values(ascending=False).index
        return ', '.join(
               (s.index#.to_series()
                + ' = ' + s.astype(str) + '%'
               ).loc[order]
               )
    
    df1['Proportion'] = df1[cols].div(df1[cols].sum(axis=1), axis=0).round(2).apply(cust_fmt, axis=1)