Search code examples
scala

How to populate null values in Array with default value in Scala dataframe?


I have a dataframe which is having a array column. It contains list of values along with nulls. I want to replace the nulls with some other value.

import spark.implicits._
val columns=Array("id", "subject")
val df1=sc.parallelize(Seq(
  (1, Array("eng","math",null,null))
  
)).toDF(columns: _*)

df1.printSchema
df1.show()

Solution

  • Use IFNULL function to replace with default values incase of if it is null. Check below code.

    Adding 0 as default values & change this as per requirement.

    val default = "0"
    
    df1
    .selectExpr(
        "id", 
        s"TRANSFORM(subject, s -> IFNULL(s, '${default}')) as subject"
    ).show(false)
    
    +---+-----------------+
    |id |subject          |
    +---+-----------------+
    |1  |[eng, math, 0, 0]|
    +---+-----------------+