Search code examples
pythonpandasnumexpr

Selecting with query where conditions are met for all columns


Say I have a dataframe in Pandas consisting of several columns. I would like to select the subset of the dataframe where all columns are between two values min and max.

How can I do this using query? Do I need to specify every column name one by one in the expression e.g.:

df.query('(A<{max} & A>{min}) & (B{min} & B{max}) & (C{min} & C{max})'.format(min,max))

Aside from query, what other alternatives do I have? Perhaps working with the indices where the condition is met directly?


Solution

  • For your use case I'd do it the non-query way, constructing a boolean frame and then calling the .all method:

    >>> minval, maxval = 20, 80
    >>> df = pd.DataFrame(np.random.randint(0, 100, (20,6)))
    >>> df[((df > minval) & (df < maxval)).all(axis=1)]
         0   1   2   3   4   5
    2   74  30  30  76  31  66
    10  49  39  71  43  30  50
    

    I don't think the advantage of manually constructing an expanded query would be worth it.