Search code examples
pandasstatisticsmultiple-columns

Pandas - select dataframe columns if statistic is greater than certain value


I have pandas dataframe df. I would like to select columns which have standard deviation grater than 1. Here is what I tried

df2 = df[df.std() >1]
df2 = df.loc[df.std() >1] 

Both generated error. What am I doing wrong?


Solution

  • We need to get the list of columns whose values have standard deviation greater than 1.

    That list of columns can then be passed to the dataframe to select the relevant data.

    Be mindful to remove the columns of type "object" before trying to get the list. Below line get the list of columns.

    df.columns[(df.std() > 1).to_list()]
    

    Below line to get the dataframe with the selected columns.

    df[df.columns[(df.std() > 1).to_list()]]