Search code examples
scalaapache-sparkgenericsimplicitscala-2.12

Parametric polymorphism issue with spark implicits, value toDF is not a member of Seq[T]


This is on scala 2.12.10 and spark 2.4.8. I am trying to define a trait as follows with a method that can convert an array of some case class to be converted to a dataframe. The type parameter is meant to be some schema (the case class) that extends QuestionSchema hence T <: schemas.QuestionSchema. I try to import spark implicits so that I can convert the data to dataframe after converting to sequence but it does not seem to to work, can anyone see what is wrong here and another way to do this?

trait DataStore {
  var data1 = Array.empty[Data1Type]
  var data2 = Array.empty[Data2Type]
  def convertToDf[T <: schemas.QuestionSchema](res: Array[T])(implicit spark: SparkSession): DataFrame = {
    import spark.implicits._
    res.toSeq.toDF()  // value toDF is not a member of Seq[T]
  }
}

Solution

  • Add context bound org.apache.spark.sql.Encoder

    def convertToDf[T <: schemas.QuestionSchema : Encoder]
    

    The thing is that Spark defines an instance of the type class Encoder for example for a type T that is a case class. Inside the method convertToDf, T is not a case class yet, it's just an abstract type so far (more precisely, a method parameter). It will become the type of a case class after you call the method. Adding the context bound (which is the same as def convertToDf[T <: QuestionSchema](res: Array[T])(implicit spark: SparkSession, enc: Encoder[T])) postpones implicit resolution from the definition site of a method to a call site. So you should add import spark.implicits._ at the call site.

    How to resolve implicit lookup by bounded generic?

    Typeclasses methods called in functions with type parameters

    Why the Scala compiler can provide implicit outside of object, but cannot inside?

    How to define induction on natural numbers in Scala 2.13?