Search code examples
pythonpandasfunctionapply

Pandas apply function to both columns and rows


I have the the following pandas dataframe:

    COL1  COL2  COL3  
N1     1     2     0    
N2     2     2     1    
N3     3     2     1

I would like to apply a function to each column & row, where e.g.: x[N1, COL1] = x[N1, COL1] / sum(x[_, COL1])

The result should look like:

    COL1  COL2  COL3  
N1   1/6   2/6   0/2    
N2   2/6   2/6   1/2    
N3   3/6   2/6   1/2

I cannot simply use df.apply(lambda x: x/sum(x), axis=1), because the x in this case would be the whole column... How can I do this?


Solution

  • Best is use vectorized solution - divide DataFrame by Series:

    df = df.div(df.sum())
    print (df)
            COL1      COL2  COL3
    N1  0.166667  0.333333   0.0
    N2  0.333333  0.333333   0.5
    N3  0.500000  0.333333   0.5
    

    Your solution (slowier, hacky):

    df = df.apply(lambda x: df[x.name].div(sum(x)))
    print (df)
            COL1      COL2  COL3
    N1  0.166667  0.333333   0.0
    N2  0.333333  0.333333   0.5
    N3  0.500000  0.333333   0.5
    

    If no missing values in original columns:

    df = pd.DataFrame({'COL1': [1, 0, 3], 'COL2': [0, 0, 2], 'COL3': [-1, 0, 1]})
        
    print (df)
       COL1  COL2  COL3
    0     1     0    -1
    1     0     0     0
    2     3     2     1
    
    
    df1 = df.div(df.sum()).replace([np.inf, -np.inf], -88).fillna(-99)
    print (df1)
    
       COL1  COL2  COL3
    0  0.25   0.0 -88.0
    1  0.00   0.0 -99.0
    2  0.75   1.0 -88.0