Search code examples
scalaapache-sparkapache-spark-sql

how to take count of null values from table using spark-scala?


I have table name "data" which having 5 columns and each column contain some null values. i want to take a count of each column's null value how can i write code for that result! its easy to take count of one column but how can i write code for counting each column of table.

sample :

+----------------+----------------+--------+---------+-------------+
| 2              |3               |4       |  5      |6            |
+----------------+----------------+--------+---------+-------------+
|null             |1               | null   |null     |null         |
|null             |null            | null   |null     |asdc         |
|null             |23              | 23     |null     |null         |
|null             |null            | null   |23       |41           |
|24               |3               | 35     |null     |null         |
|null             |null            | null   | 1       |wef          |
|null             |32              | 54     |null     |45           |
|null             |null            | null   |123      |null         |
|w411             |31              | 12     |null     |null         |
|null             |null            | null   |11       |null         |
+----------------+----------------+--------+---------+-------------+

how take null count of each column

I have 40 tables which contain 5 or 6 or 10 columns and each column contain some null values i just want to take null count of each column of tables which is the best way to take null count!

Thanks in advance!


Solution

  • If you don't want to drop empty rows/columns and you don't need to do any additional calculations in you job, this should work for your:

     df.select(df.columns.map(colName => {
        count(when(col(colName).isNull, true)) as s"${colName}_nulls_count"
      }): _*)
      .show(10) // or save result somewhere