Search code examples
scalaapache-sparkrddflatmap

Unable to find Encode[Char] while using flatMap with toCharArray in spark


import spark.implicits._
import org.apache.spark.sql.functions._
var names = Seq("ABC","XYZ").toDF("names")
var data = names.flatMap(name=>name.getString(0).toCharArray).map(rec=> 
                              (rec,1)).rdd.reduce((x,y)=>('S',x._2 + y._2))

ERROR: Error:(20, 27) Unable to find encoder for type Char. An implicit Encoder[Char] is needed to store Char instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. var data = names.flatMap(name=>name.getString(0).toCharArray).map(rec=>(rec,1)).rdd.reduce((x,y)=>('S',x._2 + y._2))


Solution

  • You can convert the dataframe to RDD first before doing the flatMap and map operations:

    var data = names.rdd
                    .flatMap(name => name.getString(0).toCharArray)
                    .map(rec => (rec, 1))
                    .reduce((x, y) => ('S', x._2 + y._2))
    

    which will return 6, because you're just counting the number of chars in the first column of the dataframe. Not sure if this is your desired output.