I have a Boolean column that is sometimes NULL and want to assign it as such. My code:
from pyspark.sql import functions as F
df = df.withColumn('my_column_name', F.lit(None).cast("string"))
My error: Column type: BOOLEAN, Parquet schema: optional byte_array
My attempted solution was
df = df.withColumn('my_column_name', F.lit(None).cast('boolean'))
but this doesn't seem right
Using the F.when function you can assign a value NULL on a condition,
so for your problem as example this will be:
df = df.withColumn('my_column_name', F.when(F.col("condition_column").isNull(), None).otherwise(F.col("condition_column")))
If you want to cast it you would need to use the PySpark
null function or lit(None)
to create a null
literal of the appropriate datatype