Python: Find the name of the columns where NA values are above a threshold

I know how to drop columns with many NA from a data frame,

df = df.dropna(axis=1, thresh=100)

But is there a way to know their names?

I though to create something like this:

na_values =  df.isnull().sum()/len(df)

Which gives me a list with the portion of NA within each columns

x_9       0.011700
x_10      0.001307
x_11      0.025830
x_12      0.011691
x_13      0.002739
x_14      0.238406

But I want to extract only the ones that have above a 0.2 threshold, like x_14.

Any ideas? I tried:

print(i > 0.4 for i in na_values)

but it doesn't work

Solution

You can use the same Series with a boolean condition to mask the Series:

In [36]:
na_values[na_values>0.2]

Out[36]:
            na
val           
x_14  0.238406

You can see the mask:

In [37]:
na_values > 0.2

Out[37]:
val
x_9     False
x_10    False
x_11    False
x_12    False
x_13    False
x_14     True
Name: na, dtype: bool

so this will mask out all the False rows