Search code examples
pythonpandasdataframeconditional-statements

Extract rows from pandas dataframe with at least one value greater than or equal to values from array


I have the following dataframe:

    mba_p   ssc_p   hsc_p   degree_p etest_p    

0   58.80   67.00   91.00   58.00   55.0
1   66.28   79.33   78.33   77.48   86.5
2   57.80   65.00   68.00   64.00   75.0
3   59.43   56.00   52.00   52.00   66.0
4   55.50   85.80   73.60   73.30   96.8
... ... ... ... ... ...
210 74.49   80.60   82.00   77.60   91.0
211 53.62   58.00   60.00   72.00   74.0
212 69.72   67.00   67.00   73.00   59.0
213 60.23   74.00   66.00   58.00   70.0
214 60.22   62.00   58.00   53.00   89.0

I would like to filter the data based on this condition:

Extract all the rows where at least one value is greater than any of the values in this list:

[66.255, 75.7, 73.0, 72.0, 83.5]

tl;dr: the first row is a candidate since 67 and 91 are greater than more than one value in the list above (the values in the list are the 75th percentile of every column basically). Similarly, extract all such rows.

I tried using np.where() using the steps shown here but since that is filtering based only on one value and not a list, I can't figure how to extend that to this situation. A simple condition check would return rows only where all values are greater than equal I guess (unless I'm missing anything). Sorry if this sounds a little convoluted but not able to crack it yet.


Solution

  • a = np.array([66.255, 75.7, 73.0, 72.0, 83.5])
    
    df[df.apply(lambda x: (x.to_numpy() > a[:, None]).any(), axis = 1)]
    
         mba_p  ssc_p  hsc_p  degree_p  etest_p
    0    58.80  67.00  91.00     58.00     55.0
    1    66.28  79.33  78.33     77.48     86.5
    2    57.80  65.00  68.00     64.00     75.0
    4    55.50  85.80  73.60     73.30     96.8
    210  74.49  80.60  82.00     77.60     91.0
    211  53.62  58.00  60.00     72.00     74.0
    212  69.72  67.00  67.00     73.00     59.0
    213  60.23  74.00  66.00     58.00     70.0
    214  60.22  62.00  58.00     53.00     89.0