I have a data frame with multiple columns. Each column is a time series of some variable. I only want to pick columns that are significant (by some metric), i.e. I want to pick a subset of columns, s.t. for each column,
the max (over all rows) is greater than x
i | col1 | col2 | col3 | ....
0 | 0.1 | 0.5. | 0.3. | ....
1 | .09 | 0.4 | 0.4 | ....
2 | .08 | .45 | .36 | ...
Let's say, from the table above, I want to pick only [col2, col3] (with a condition: column_avg > 0.2 ).
Or, only col2, with a condition: column_avg>.4.
And similarly, instead of being conditional on the avg, make it conditional on min or max for each column
Try this:
df2 = df[df.columns[df.mean(axis=0) > 0.2]]
df3 = df[df.columns[df.max(axis=0) > 0.4]]
df.min
works the same way.