Search code examples
scalacsvgenericscase-class

Generic class to read csv in scala


I am new to Scala and I am trying to build a framework that can read multiple types of csv files and all read operations will go through one class. For eg, I have two types of CSVs: Student and Professor and I am doing something like this.

abstract class Person
case class Student(name: String, major: String, marks: Double) extends Person
case class Professor(name: String, salary: Double) extends Person

my csv reader looks something like this

  private def readCsv[T: Encoder](location: String) = {
    spark
      .read
      .option("header", "true")
      .option("inferSchema", "true")
      .option("delimiter", ";")
      .csv(location)
      .as[T]
  }

def data:Dataset[Person](location) = readCsv[Person](location)

I am getting a compile-time error in the last line as No implicit arguments of Type: Encoder[Person]. Call to this method looks something like this:

val studentData = storage.data[Student]("Student.csv")

Is there any better way to achieve this?


Solution

    1. your ADT definition should probably be final/sealed else it's hard to derive Encoders for it.
    2. IIRC Spark does not support Sum types sadly because there is no schema representation for it. A somewhat common hack is to represent Either[A, B] as (Option[A], Option[B]) but yeah it's a pain