I have this dataframe :
+------+-------------------+-----------+
|brand |original_timestamp |weight |
+------+-------------------+-----------+
|BR1 |1632899456 |4.0 |
|BR2 |1632899456 |null |
|BR3 |1632899456 |2.0 |
|BR4 |1632899155 |2.0 |
|BR5 |1632899155 |null |
I would like to drop (filter out) null values and print a message such as :
"Weight for brand BR2 is null, dropping it from the data"
"Weight for brand BR5 is null, dropping it from the data"
I am using Spark version 3.2.2 and SQLContext, with scala language.
You have to use filter
function with isNull
& isNotNull
to get desired output
Filter null
values using isNull
df.filter($"weight".isNull).show(false)
+-----+------------------+------+
|brand|original_timestamp|weight|
+-----+------------------+------+
|BR2 |1632899456 |NULL |
|BR5 |1632899155 |NULL |
+-----+------------------+------+
Print message
df.filter($"weight".isNull)
.selectExpr("""
concat(
'\"Weight for brand ',
brand,
' is null, dropping it from the data\"'
) message
""")
.as[String]
.collect
.foreach(println)
"Weight for brand BR2 is null, dropping it from the data"
"Weight for brand BR5 is null, dropping it from the data"
Filter not null
values using isNotNull
df.filter($"weight".isNotNull).show(false)
+-----+------------------+------+
|brand|original_timestamp|weight|
+-----+------------------+------+
|BR1 |1632899456 |4.0 |
|BR3 |1632899456 |2.0 |
|BR4 |1632899155 |2.0 |
+-----+------------------+------+