pandas how take max,min,mean from rows by condition

I am facing the following problem: I can do conditional column calculations, but, i can't use the functions min, max, mean for every row by condition

import pandas as pd
data = {'Name': ['Tom', 'nick', 'krish', 'jack','bob'],
        'level_1': [2, -3, 7, -4,-2],
        'level_2': [6, 5, -3, -9,2],
        'level_3': [-2, -1, 4, 6,-4],
        }

df = pd.DataFrame(data)

print(df)


def f(row):
    if (row['level_1'] < 0) & (row['level_2'] < 0):
        
        val = row['level_1'] + row['level_2']
    elif (row['level_1'] > 0) & (row['level_2'] > 0):

        val = row['level_1'] - row['level_2']

    else:
        val = row[['level_1','level_2','level_3']].max#(axis=1)

    return val

df['level_final'] = df.apply(f, axis=1)

print(40*'#')

print(df)

can any one help me please?

Solution

# Cleaner & Faster solution

Here is a cleaner way to do this (and faster) using numpy, than using lambda with an apply method (which is not vectorized over rows)-

import numpy as np

df = df.set_index('Name')       #to work with numeric features

#Your 2 conditions
c1 = (df.level_1 < 0) & (df.level_2 < 0)
c2 = (df.level_1 > 0) & (df.level_2 > 0)

#Your 3 values
v1 = df.level_1 + df.level_2    #for c1
v2 = df.level_1 - df.level_2    #for c2
v3 = df.max(1)                  #if c1 and c2 fail

#apply everything, everywhere all at once
df['val'] = np.select([c1, c2], [v1, v2], default=v3)
df = df.reset_index()

print(df)

    Name  level_1  level_2  level_3  val
0    Tom        2        6       -2   -4
1   nick       -3        5       -1    5
2  krish        7       -3        4    7
3   jack       -4       -9        6  -13
4    bob       -2        2       -4    2

# Notes on your current code

"i need max from every row (axis=1) by condition" (from your comment)

The reason your code fails is that you are misunderstanding how the apply method works. When you are using an apply function over rows, you are pulling each row as a pd.Series object and then applying your function f function on that, one row at a time.

When you use the pd.Series.max(axis=1) on any row series object, it throws the following error as expected -

ValueError: No axis named 1 for object type Series

For a series object, you can't use max.axis(1) since it has no axis. So instead of using that or simply max which is the incorrect syntax for a method, just use pd.Series.max(). Read documentation here

#Your code modified
def f(row):
    if (row['level_1'] < 0) & (row['level_2'] < 0):
        val = row['level_1'] + row['level_2']
    elif (row['level_1'] > 0) & (row['level_2'] > 0):
        val = row['level_1'] - row['level_2']
    else:
        val = row[['level_1','level_2','level_3']].max() #<---
    return val

df['level_final'] = df.apply(f, axis=1)
print(df)

    Name  level_1  level_2  level_3  level_final
0    Tom        2        6       -2           -4
1   nick       -3        5       -1            5
2  krish        7       -3        4            7
3   jack       -4       -9        6          -13
4    bob       -2        2       -4            2