Search code examples
pythonpandasbooleanboolean-logic

Pandas create boolean column based on whether 3 column values are all negative or all positive


I have the following data frame,

       f1           f2      f3  f4  f5          f6          f7
0   -0.004446   59.763107   x0  0   60.029999   59.160000   -0.014493
1   -0.003414   113.212220  x1  0   113.599998  113.650002  0.000440
2   -0.013123   36.435513   x2  1   36.919998   36.889999   -0.000813
3   0.003558    68.854090   x3  0   68.420158   68.410179   -0.000146
4   -0.006840   23.021446   x4  0   23.180000   23.100000   -0.003451
... ... ... ... ... ... ... ...
145 0.000724    253.113110  x5  1   252.929993  247.210007  -0.022615
146 0.006567    128.236680  x6  0   127.400002  127.059998  -0.002669
147 -0.009016   610.079200  x7  1   615.630005  605.369995  -0.016666
148 -0.011290   165.173920  x8  0   167.059998  158.300003  -0.052436
149 0.021474    358.496370  x9  0   350.959991  343.329987  -0.021740

Basically, for column f4, treat 0 as a negative or False and 1 as a positive or True.

If values from columns f1, f4 and f7 are all negative or all positive, the test column should return true for that row, else it should return false.

I want to create new column called 'test' that will say True or False based on my conditions. If in any cases, one of them don't match (either not all true or not all false), it will show False.

I can make the following code work with 2 columns,

df.loc[:,'test'] = df['f1'].ge(0).eq(df['f4'])

and it works fine.

However, if I try to chain it to add the f7 column like this,

df.loc[:,'test'] = df['f1'].ge(0).eq(df['f4']).eq(df['f7'].ge(0))

the results are wrong.

I want the test column to look like this,

       f1           f2      f3  f4  f5          f6          f7         test
0   -0.004446   59.763107   x0  0   60.029999   59.160000   -0.014493  True
1   -0.003414   113.212220  x1  0   113.599998  113.650002  0.000440   False
2   -0.013123   36.435513   x2  1   36.919998   36.889999   -0.000813  False
3   0.003558    68.854090   x3  0   68.420158   68.410179   -0.000146  False
4   -0.006840   23.021446   x4  0   23.180000   23.100000   -0.003451  True
... ... ... ... ... ... ... ...
145 0.000724    253.113110  x5  1   252.929993  247.210007  -0.022615  False
146 0.006567    128.236680  x6  0   127.400002  127.059998  -0.002669  False
147 -0.009016   610.079200  x7  1   615.630005  605.369995  -0.016666  False
148 -0.011290   165.173920  x8  0   167.059998  158.300003  -0.052436  True
149 0.021474    358.496370  x9  0   350.959991  343.329987  -0.021740  False

How do I get the code to work the way I want it to?


Solution

  • maybe because if f1 and f4 are negative, the first two comparisons will return positive, therefore chaining third will return false.

    I think what you are trying to do is negative of XOR operation.
    Code below will give you the idea.

    table = np.array(
       [[-5.5,0,-6.6],
        [-5.5,0, 6.6],
        [-5.5,1,-6.6],
        [-5.5,1, 6.6],
        [ 5.5,0,-6.6],
        [ 5.5,0, 6.6],
        [ 5.5,1,-6.6],
        [ 5.5,1, 6.6]]
    )
    
    pd_table = pd.DataFrame(table)
    
    df = pd_table
    and_all = df[0].ge(0) & df[1] & df[2].ge(0)
    or_all = df[0].ge(0) | df[1] | df[2].ge(0)
    
    print(and_all.eq(or_all))
    
    

    The output will be like this:

    0     True
    1    False
    2    False
    3    False
    4    False
    5    False
    6    False
    7     True
    dtype: bool