Search code examples
pythondataframepyspark

How to create a NULL Boolean column in a pyspark dataframe


I have a Boolean column that is sometimes NULL and want to assign it as such. My code:

from pyspark.sql import functions as F
df = df.withColumn('my_column_name', F.lit(None).cast("string"))

My error: Column type: BOOLEAN, Parquet schema: optional byte_array

My attempted solution was df = df.withColumn('my_column_name', F.lit(None).cast('boolean')) but this doesn't seem right


Solution

  • Using the F.when function you can assign a value NULL on a condition,

    so for your problem as example this will be:

    df = df.withColumn('my_column_name', F.when(F.col("condition_column").isNull(), None).otherwise(F.col("condition_column")))
    

    If you want to cast it you would need to use the PySpark null function or lit(None) to create a null literal of the appropriate datatype