Search code examples
scalaapache-spark

Spark withColumn null default value


I am trying to add a new String column to a dataframe with a default value of null (a non-null value will be applied later)

Here is my code

.withColumn("column-name", lit(null: String))

This creates a column with the Void type which I do not want

What is the easiest way to create a column of type String with null default value?

Note, the structure of the set of jobs is set in stone, and I am leaving this company very soon, so I am not interesting in arguing that the code should be restructured, I just want to give them the code they have asked for with the least fuss

Note also we aren't using a code-defined schema anywhere, it is pure schema inference


Solution

  • You can use lit with null, then cast it to your desired type.

    Example

    df.withColumn("test", lit(null).cast(StringType))
    

    Output

    +---+----+
    |id |test|
    +---+----+
    |1  |null|
    |2  |null|
    |3  |null|
    +---+----+
    

    Schema

    root
     |-- id: integer (nullable = false)
     |-- test: string (nullable = true)
    

    Good luck!