Search code examples
pythonpandascomparisonmincalculation

How to compare each cell in one column with a specific value in pandas?


I have a dataframe like this, I want to achieve this:

if the sign of A is the same with the sign of B, get a new column C = min(3, |A|); if the sign of A is is different from the sign of B, C = min(3, B); if the values for A & B are zero, C=A

Type subType  A       B     C     
 X    a       -1      4     3   
 X    a       5       9     3
 X    a       5       9     3
 X    b       1       4     1   
 X    b       3       5    ...
 X    b       5       0
 Y    a       -1      1         
 Y    a       3       2  
 Y    a       -5      3
 Y    b       1       4        
 Y    b       3       5 
 Y    b       5      -2

I tried :

if df["A"] * df["B"] > 0:
    df["C"] = (3, abs(df["A"]).min(axis=1)

It gave me error, seems like I can't compare a value '3' with a column directly, any suggestions?

Follow-up: what if the formula is more complex like C = A + min(3, |A|) *B?


Solution

  • Because if the values for A & B are zero it means use minimum between (3, abs(0)) what is always 0 solution should be simplify with numpy.where and numpy.minimum:

    #compare signs
    m = np.sign(df["A"]) == np.sign(df["B"])
    #alternative
    #m = (df["A"] * df["B"]) >= 0
    df['C'] = np.where(m, np.minimum(3, df.A.abs()), np.minimum(3, df.B))
    print (df)
       Type subType  A  B  C
    0     X       a -1  4  3
    1     X       a  5  9  3
    2     X       a  5  9  3
    3     X       b  1  4  1
    4     X       b  3  5  3
    5     X       b  5  0  0
    6     Y       a -1  1  1
    7     Y       a  3  2  3
    8     Y       a -5  3  3
    9     Y       b  1  4  1
    10    Y       b  3  5  3
    11    Y       b  5 -2 -2
    

    EDIT: If need more condition in pandas/numpy is possible use instead multiple np.where function numpy.select:

    m1 = np.sign(df.A) == np.sign(df.B)
    m2 = np.sign(df.A) == np.sign(df.C)
    
    s1 = df.A + np.minimum(3, df.A.abs()) * df.B
    s2 = df.C + np.minimum(3, df.A.abs()) * df.B
    
    df['D'] = np.select([m1, m2], [s1, s2], default=df.A)
    print (df)
       Type subType  A  B  C   D
    0     X       a -1  4  3  -1
    1     X       a  5  9  3  32
    2     X       a  5  9  3  32
    3     X       b  1  4  1   5
    4     X       b  3  5  3  18
    5     X       b  5  0  0   5
    6     Y       a -1  1  1  -1
    7     Y       a  3  2  3   9
    8     Y       a -5  3  3  -5
    9     Y       b  1  4  1   5
    10    Y       b  3  5  3  18
    11    Y       b  5 -2 -2   5