How to detect null column in pyspark

I have a dataframe defined with some null values. Some Columns are fully null values.

>> df.show()
|  A|  B|  C|   D|

In my case, I want to return a list of columns name that are filled with null values. My idea was to detect the constant columns (as the whole column contains the same null value).

this is how I did it:

nullCoulumns = [c for c, const in df.select([(min(c) == max(c)).alias(c) for c in df.columns]).first().asDict().items() if const] 

but this does no consider null columns as constant, it works only with values. How should I then do it ?


  • Extend the condition to

    from pyspark.sql.functions import min, max
    ((min(c).isNull() & max(c).isNull()) | (min(c) == max(c))).alias(c) 

    or use eqNullSafe (PySpark 2.3):