scala apache-spark dataframe apache-spark-sql case-class

Spark: Unsupported literal type class scala.collection.immutable.Nil$ List()

I have searched through other answers related to this question and they have not helped.

I am trying to add a column to a dataframe. This column will have a datatype of Seq[CaseClass]. At first I thought it might be that spark doesn't support collection type columns but this isn't the case.

Here is an example of the code I am trying to run. I just want to add an empty Seq[CaseClass] to each row that I can append to later on.

case class Employee(name: String)
val emptyEmployees: Seq[Employee] = Seq()
df.withColumn("Employees", lit(emptyEmployees))

But then I get this error being thrown at the line with the withColumn

Unsupported literal type class scala.collection.immutable.Nil$ List()
java.lang.RuntimeException: Unsupported literal type classscala.collection.immutable.Nil$ List()

Solution

If you are using spark 2.2+, then just change lit() to typedLit(), according to this answer.

case class Employee(name: String)
val emptyEmployees: Seq[Employee] = Seq()
val df = spark.createDataset(Seq("foo")).toDF("foo")
df.withColumn("Employees", typedLit(emptyEmployees)).show()

shows us:

+---+---------+
|foo|Employees|
+---+---------+
|foo|       []|
+---+---------+

Update

For 2.1, the linked answer above for that version works by converting your lit(Array) into an array() of lit()s (with some magic scala syntax). In your case, this will work because the array is empty.

def asLitArray[T](xs: Seq[T]) = array(xs map lit: _*)

case class Employee(name: String)

val emptyEmployees: Seq[Employee] = Seq()
val df = spark.createDataset(Seq("foo")).toDF("foo")

df.withColumn("Employees", asLitArray(emptyEmployees)).show()

Which has the same result:

+---+---------+
|foo|Employees|
+---+---------+
|foo|       []|
+---+---------+

To actually have something in your Seq would require a slightly different function.