Say I have a dataframe in Pandas consisting of several columns. I would like to select the subset of the dataframe where all columns are between two values min
and max
.
How can I do this using query
? Do I need to specify every column name one by one in the expression e.g.:
df.query('(A<{max} & A>{min}) & (B{min} & B{max}) & (C{min} & C{max})'.format(min,max))
Aside from query, what other alternatives do I have? Perhaps working with the indices where the condition is met directly?
For your use case I'd do it the non-query way, constructing a boolean frame and then calling the .all
method:
>>> minval, maxval = 20, 80
>>> df = pd.DataFrame(np.random.randint(0, 100, (20,6)))
>>> df[((df > minval) & (df < maxval)).all(axis=1)]
0 1 2 3 4 5
2 74 30 30 76 31 66
10 49 39 71 43 30 50
I don't think the advantage of manually constructing an expanded query
would be worth it.