Search code examples
pyspark

Pyspark : Write a function generic


I would like to write in function pyspark this part

df = (df.withColumn("January", F.lit(None).cast('double'))
        .withColumn("February", F.lit(None).cast('double'))
        .withColumn("March", F.lit(None).cast('double'))
        .withColumn("April", F.lit(None).cast('double'))
        .withColumn("May", F.lit(None).cast('double'))
        .withColumn("June", F.lit(None).cast('double'))
        .withColumn("July", F.lit(None).cast('double'))
        .withColumn("August", F.lit(None).cast('double'))
        .withColumn("September", F.lit(None).cast('double'))
        .withColumn("November", F.lit(None).cast('double'))
        .withColumn("December", F.lit(None).cast('double'))


Solution

  • You can use withColumns instead of group of withColumn

    months = ["January", ... , "December"]
    df = df.withColumns(
        {month: F.lit(None).cast('double') for month in months}
    )
    

    Documentation is here: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.withColumns.html