I have a pyspark dataframe and I want to achieve the following conditions:
if col1 is not none:
if col1 > 17:
return False
else:
return True
return None
I have implemented it in the following way:
out = out.withColumn('col2', out.withColumn(
'col2', when(col('col1').isNull(), None).otherwise(
when(col('col1') > 17, False).otherwise(True)
)))
However when I run this I achieve the following error :
assert isinstance(col, Column), "col should be Column"
AssertionError: col should be Column
Any ideas what I might be doing wrong.
I think the problem comes from the typo you made, writting twice the out.withColumn
.
here is my code :
from pyspark.sql import functions as F
a = [
(None,),
(16,),
(18,),
]
b = [
"col1",
]
df = spark.createDataFrame(a, b)
df.withColumn(
"col2",
F.when(F.col("col1").isNull(), None).otherwise(
F.when(F.col("col1") > 17, False).otherwise(True)
),
).show()
+----+-----+
|col1| col2|
+----+-----+
|null| null|
| 16| true|
| 18|false|
+----+-----+
You can also do it a bit differently because you do not need the first otherwise
or no need to evaluate explicitly the NULL
:
df.withColumn(
"col2",
F.when(F.col("col1").isNull(), None)
.when(F.col("col1") > 17, False)
.otherwise(True),
).show()
# OR
df.withColumn(
"col2", F.when(F.col("col1") > 17, False).when(F.col("col1") <= 17, True)
).show()