Search code examples
jsonscalahashmapapache-spark-datasetjsonlines

How do I read from a json-lines file into a Dataset with an immutable.HashMap?


I have the following classes,

case class myClass (a: String, b: Boolean, c: Double, d: HashMap[String, E])
case class E (f: String, g: Int)

the following code to load into this from a json file into a Dataset[myClass],

mySparkSession.read.schema(Encoders.product[myClass].schema).json("myData.json").as[myClass]

and a .json(-lines) file with lines like this:

{"a": "text","b": "false","c": 123456.78,"d": ["text", [{"f": "text"},{"g": 1}]]}

I get the following error while running the code:

failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 331, Column 75: No applicable constructor/method found for actual parameters "java.lang.String, boolean, double, scala.collection.immutable.Map"; candidates are: "my.package.name.objname$myClass(java.lang.String, boolean, double, scala.collection.immutable.HashMap)"

How do I fix this?


Solution

  • Try with this:

    import scala.collection.immutable.Map
    case class myClass (a: String, b: Boolean, c: Double, d: Map[String, E])
    

    I've found that using df.as[myClass] can be a bit picky about which type of Map you use. Without the import it defaults to scala.collection.Map, and that will fail as well.