I have the the following pandas dataframe:
COL1 COL2 COL3
N1 1 2 0
N2 2 2 1
N3 3 2 1
I would like to apply a function to each column & row, where e.g.: x[N1, COL1] = x[N1, COL1] / sum(x[_, COL1])
The result should look like:
COL1 COL2 COL3
N1 1/6 2/6 0/2
N2 2/6 2/6 1/2
N3 3/6 2/6 1/2
I cannot simply use df.apply(lambda x: x/sum(x), axis=1)
, because the x in this case would be the whole column... How can I do this?
Best is use vectorized solution - divide DataFrame by Series
:
df = df.div(df.sum())
print (df)
COL1 COL2 COL3
N1 0.166667 0.333333 0.0
N2 0.333333 0.333333 0.5
N3 0.500000 0.333333 0.5
Your solution (slowier, hacky):
df = df.apply(lambda x: df[x.name].div(sum(x)))
print (df)
COL1 COL2 COL3
N1 0.166667 0.333333 0.0
N2 0.333333 0.333333 0.5
N3 0.500000 0.333333 0.5
If no missing values in original columns:
df = pd.DataFrame({'COL1': [1, 0, 3], 'COL2': [0, 0, 2], 'COL3': [-1, 0, 1]})
print (df)
COL1 COL2 COL3
0 1 0 -1
1 0 0 0
2 3 2 1
df1 = df.div(df.sum()).replace([np.inf, -np.inf], -88).fillna(-99)
print (df1)
COL1 COL2 COL3
0 0.25 0.0 -88.0
1 0.00 0.0 -99.0
2 0.75 1.0 -88.0