import spark.implicits._
import org.apache.spark.sql.functions._
var names = Seq("ABC","XYZ").toDF("names")
var data = names.flatMap(name=>name.getString(0).toCharArray).map(rec=>
(rec,1)).rdd.reduce((x,y)=>('S',x._2 + y._2))
ERROR: Error:(20, 27) Unable to find encoder for type Char. An implicit Encoder[Char] is needed to store Char instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. var data = names.flatMap(name=>name.getString(0).toCharArray).map(rec=>(rec,1)).rdd.reduce((x,y)=>('S',x._2 + y._2))
You can convert the dataframe to RDD first before doing the flatMap
and map
operations:
var data = names.rdd
.flatMap(name => name.getString(0).toCharArray)
.map(rec => (rec, 1))
.reduce((x, y) => ('S', x._2 + y._2))
which will return 6, because you're just counting the number of chars in the first column of the dataframe. Not sure if this is your desired output.