This is my first post so let me know if I need to give more details.
I am trying to create a boolean column, "immediate", that shows true when at least on of the columns has some data in it. If all are null then the column should be false. I am using the when() .otherwise function in spark but I'm not getting the result I would expect.
Below is the code I'm using:
val evaluation = evaluation_raw
.withColumn("immediate",
when(col("intended_outcome_review").isNull
&& col("outcome").isNull
&& col("impact").isNull
&& col("impact_self").isNull
&& col("next_step").isNull,
lit(false))
.otherwise(lit(true)))
.select(
col("id"),
col("intended_outcome_review"),
col("outcome"),
col("impact"),
col("impact_self"),
col("next_step"),
col("immediate"))
Desired outcome:
+--------+------------------------+-------------+-------+------------+----------+----------+
|id |intended_outcome_review |outcome |impact |impact_self |next_step |immediate |
+--------+------------------------+-------------+-------+------------+----------+----------+
|1568 |null |null |4 |3 |null |true |
|1569 |null |null |null |null |null |false |
|1570 |null |null |null |null |null |false |
|1571 |1 |improved coms|3 |3 |email prof|true |
+--------+------------------------+-------------+-------+------------+----------+----------+
Actual outcome:
+--------+------------------------+-------------+-------+------------+----------+----------+
|id |intended_outcome_review |outcome |impact |impact_self |next_step |immediate |
+--------+------------------------+-------------+-------+------------+----------+----------+
|1568 |null |null |4 |3 |null |true |
|1569 |null |null |null |null |null |true |
|1570 |null |null |null |null |null |false |
|1571 |1 |improved coms|3 |3 |email prof|true |
+--------+------------------------+-------------+-------+------------+----------+----------+
If anyone knows what I may be doing wrong please let me know. Thanks!
Turns out some of the columns are converted from Null to "" when other parts of the form are filled out.
Answer below considers empty strings and Null values:
.withColumn("immediate",
when((col("intended_outcome_review").isNull || col("intended_outcome_review") ==="")
&& (col("outcome").isNull || col("outcome") === "")
&& (col("impact").isNull || col("outcome") === "")
&& (col("impact_self").isNull || col("impact_self") === "")
&& (col("next_step").isNull || col("next_step") === ""),
lit(false))
.otherwise(lit(true)))