I would like to write in function pyspark this part
df = (df.withColumn("January", F.lit(None).cast('double'))
.withColumn("February", F.lit(None).cast('double'))
.withColumn("March", F.lit(None).cast('double'))
.withColumn("April", F.lit(None).cast('double'))
.withColumn("May", F.lit(None).cast('double'))
.withColumn("June", F.lit(None).cast('double'))
.withColumn("July", F.lit(None).cast('double'))
.withColumn("August", F.lit(None).cast('double'))
.withColumn("September", F.lit(None).cast('double'))
.withColumn("November", F.lit(None).cast('double'))
.withColumn("December", F.lit(None).cast('double'))
You can use withColumns
instead of group of withColumn
months = ["January", ... , "December"]
df = df.withColumns(
{month: F.lit(None).cast('double') for month in months}
)
Documentation is here: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.withColumns.html