I am facing the following problem: I can do conditional column calculations, but, i can't use the functions min, max, mean for every row by condition
import pandas as pd
data = {'Name': ['Tom', 'nick', 'krish', 'jack','bob'],
'level_1': [2, -3, 7, -4,-2],
'level_2': [6, 5, -3, -9,2],
'level_3': [-2, -1, 4, 6,-4],
}
df = pd.DataFrame(data)
print(df)
def f(row):
if (row['level_1'] < 0) & (row['level_2'] < 0):
val = row['level_1'] + row['level_2']
elif (row['level_1'] > 0) & (row['level_2'] > 0):
val = row['level_1'] - row['level_2']
else:
val = row[['level_1','level_2','level_3']].max#(axis=1)
return val
df['level_final'] = df.apply(f, axis=1)
print(40*'#')
print(df)
can any one help me please?
Here is a cleaner way to do this (and faster) using numpy
, than using lambda with an apply method (which is not vectorized over rows)-
import numpy as np
df = df.set_index('Name') #to work with numeric features
#Your 2 conditions
c1 = (df.level_1 < 0) & (df.level_2 < 0)
c2 = (df.level_1 > 0) & (df.level_2 > 0)
#Your 3 values
v1 = df.level_1 + df.level_2 #for c1
v2 = df.level_1 - df.level_2 #for c2
v3 = df.max(1) #if c1 and c2 fail
#apply everything, everywhere all at once
df['val'] = np.select([c1, c2], [v1, v2], default=v3)
df = df.reset_index()
print(df)
Name level_1 level_2 level_3 val
0 Tom 2 6 -2 -4
1 nick -3 5 -1 5
2 krish 7 -3 4 7
3 jack -4 -9 6 -13
4 bob -2 2 -4 2
Read more about numpy.select
here.
"i need max from every row (axis=1) by condition" (from your comment)
The reason your code fails is that you are misunderstanding how the apply
method works. When you are using an apply function over rows, you are pulling each row as a pd.Series
object and then applying your function f
function on that, one row at a time.
When you use the pd.Series.max(axis=1)
on any row series object, it throws the following error as expected -
ValueError: No axis named 1 for object type Series
For a series object, you can't use max.axis(1)
since it has no axis. So instead of using that or simply max
which is the incorrect syntax for a method, just use pd.Series.max()
. Read documentation here
#Your code modified
def f(row):
if (row['level_1'] < 0) & (row['level_2'] < 0):
val = row['level_1'] + row['level_2']
elif (row['level_1'] > 0) & (row['level_2'] > 0):
val = row['level_1'] - row['level_2']
else:
val = row[['level_1','level_2','level_3']].max() #<---
return val
df['level_final'] = df.apply(f, axis=1)
print(df)
Name level_1 level_2 level_3 level_final
0 Tom 2 6 -2 -4
1 nick -3 5 -1 5
2 krish 7 -3 4 7
3 jack -4 -9 6 -13
4 bob -2 2 -4 2