Search code examples
pythonpandascolumnsorting

How do I select columns from pandas dataframe that have an average value greater than some limit?


I have a data frame with multiple columns. Each column is a time series of some variable. I only want to pick columns that are significant (by some metric), i.e. I want to pick a subset of columns, s.t. for each column,

  1. the average(over all rows) is greater than x, or
  2. the max (over all rows) is greater than x

    i | col1 | col2 | col3 | ....

    0 | 0.1 | 0.5. | 0.3. | ....

    1 | .09 | 0.4 | 0.4 | ....

    2 | .08 | .45 | .36 | ...

Let's say, from the table above, I want to pick only [col2, col3] (with a condition: column_avg > 0.2 ).

Or, only col2, with a condition: column_avg>.4.

And similarly, instead of being conditional on the avg, make it conditional on min or max for each column


Solution

  • Try this:

    df2 = df[df.columns[df.mean(axis=0) > 0.2]]
    df3 = df[df.columns[df.max(axis=0) > 0.4]]
    

    df.min works the same way.