Search code examples

How to add new field to nested array of struct column in spark <= 2.3

I have a data frame with schema like below

     |-- date: timestamp (nullable = true)
     |-- questionAnswerList: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- questionNumber: string (nullable = true)
     |    |    |-- listAnswers: array (nullable = true) 
     |    |    |    |-- element: string(containsNull = true)

And i want to add a new field inside the array of struct like the schema below

     |-- date: timestamp (nullable = true)
     |-- questionAnswerList: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- index: integer (nullable = true)
     |    |    |-- questionNumber: string (nullable = true)
     |    |    |-- listAnswers: array (nullable = true) 
     |    |    |    |-- element: string(containsNull = true)

I tried to use a UDF like below

val  addIndexInStruct: UserDefinedFunction = udf((data: Seq[Row]) => {{case (Row(x:String,y:Array[String]), index) => (index, x, y )}


But i have the following error :

Caused by: scala.MatchError: ([Q10,WrappedArray(R10.1, R10.2)],0) (of class scala.Tuple2)

Anybody has an idea how to do this in spark 2.X ? I saw in others posts that in spark 3.X, transform function can be used


  • I finally solved it. Seq had to be used instead of Array in the pattern matching part

    val  addIndexInStruct: UserDefinedFunction = udf((data: Seq[Row]) => {{case (Row(x: String,y: Seq[String]), index) => (index, x, y )}