I have table name "data" which having 5 columns and each column contain some null values. i want to take a count of each column's null value how can i write code for that result! its easy to take count of one column but how can i write code for counting each column of table.
sample :
+----------------+----------------+--------+---------+-------------+
| 2 |3 |4 | 5 |6 |
+----------------+----------------+--------+---------+-------------+
|null |1 | null |null |null |
|null |null | null |null |asdc |
|null |23 | 23 |null |null |
|null |null | null |23 |41 |
|24 |3 | 35 |null |null |
|null |null | null | 1 |wef |
|null |32 | 54 |null |45 |
|null |null | null |123 |null |
|w411 |31 | 12 |null |null |
|null |null | null |11 |null |
+----------------+----------------+--------+---------+-------------+
how take null count of each column
I have 40 tables which contain 5 or 6 or 10 columns and each column contain some null values i just want to take null count of each column of tables which is the best way to take null count!
Thanks in advance!
If you don't want to drop empty rows/columns and you don't need to do any additional calculations in you job, this should work for your:
df.select(df.columns.map(colName => {
count(when(col(colName).isNull, true)) as s"${colName}_nulls_count"
}): _*)
.show(10) // or save result somewhere