I have a dataframe like this, I want to achieve this:
if the sign of A is the same with the sign of B, get a new column C = min(3, |A|); if the sign of A is is different from the sign of B, C = min(3, B); if the values for A & B are zero, C=A
Type subType A B C
X a -1 4 3
X a 5 9 3
X a 5 9 3
X b 1 4 1
X b 3 5 ...
X b 5 0
Y a -1 1
Y a 3 2
Y a -5 3
Y b 1 4
Y b 3 5
Y b 5 -2
I tried :
if df["A"] * df["B"] > 0:
df["C"] = (3, abs(df["A"]).min(axis=1)
It gave me error, seems like I can't compare a value '3' with a column directly, any suggestions?
Follow-up: what if the formula is more complex like C = A + min(3, |A|) *B
?
Because if the values for A & B are zero it means use minimum between (3, abs(0)) what is always 0
solution should be simplify with numpy.where
and numpy.minimum
:
#compare signs
m = np.sign(df["A"]) == np.sign(df["B"])
#alternative
#m = (df["A"] * df["B"]) >= 0
df['C'] = np.where(m, np.minimum(3, df.A.abs()), np.minimum(3, df.B))
print (df)
Type subType A B C
0 X a -1 4 3
1 X a 5 9 3
2 X a 5 9 3
3 X b 1 4 1
4 X b 3 5 3
5 X b 5 0 0
6 Y a -1 1 1
7 Y a 3 2 3
8 Y a -5 3 3
9 Y b 1 4 1
10 Y b 3 5 3
11 Y b 5 -2 -2
EDIT: If need more condition in pandas/numpy is possible use instead multiple np.where
function numpy.select
:
m1 = np.sign(df.A) == np.sign(df.B)
m2 = np.sign(df.A) == np.sign(df.C)
s1 = df.A + np.minimum(3, df.A.abs()) * df.B
s2 = df.C + np.minimum(3, df.A.abs()) * df.B
df['D'] = np.select([m1, m2], [s1, s2], default=df.A)
print (df)
Type subType A B C D
0 X a -1 4 3 -1
1 X a 5 9 3 32
2 X a 5 9 3 32
3 X b 1 4 1 5
4 X b 3 5 3 18
5 X b 5 0 0 5
6 Y a -1 1 1 -1
7 Y a 3 2 3 9
8 Y a -5 3 3 -5
9 Y b 1 4 1 5
10 Y b 3 5 3 18
11 Y b 5 -2 -2 5