Search code examples
pythonpandasfilterany

How to select all rows which contain values *in selected columns* greater than a threshold?


I'm trying to do the same thing as in this question, but I have have a string-type column that I need to keep in the dataframe so I can identify which rows are which. (I guess I could do this by index, but I'd like to be able to save a step.) Is there a way to not count a column when using .any(), but keep it in the resulting dataframe? Thanks!

Here's the code that words on all columns:

df[(df > threshold).any(axis=1)]

Here's the hard coded version I'm working with right now:

df[(df[list_of__selected_columns] > 3).any(axis=1)]

This seems a little clumsy to me, so I'm wondering if there's a better way.


Solution

  • You can use .select_dtype to choose all, say numerical columns:

    df[df.select_dtype(include='number').gt(threshold).any(axis=1)]
    

    Or a chunk of continuous columns with iloc:

    df[df.iloc[:,3:6].gt(threshold).any(axis=1)]
    

    If you want to select some random list of columns, you'd be best to resolve by hard coded list.