Search code examples
scalaapache-sparkobject-filessequencefile

Spark: how to read CompactBuffer from an objectFile?


I am reading the following structure from an object file:

(String, CompactBuffer(person1, person2, person3 ...) )

If I tried to read like this:

val input = sc.objectFile[(String, ListBuffer[Person])]("inputFile.txt")

val myData = input.map { t =>
  val myList = t._2
  for (p <- myList) {
    println(p.toString())
  }
  t
}

I got the following error:

java.lang.ClassCastException: org.apache.spark.util.collection.CompactBuffer cannot be cast to scala.collection.mutable.ListBuffer

However, I can NOT use CompactBuffer when read the objectFile either:

val input = sc.objectFile[(String, CompactBuffer[Person])]("inputFile.txt")

Eclipse would just tell me:

class CompactBuffer in package collection cannot be accessed in package 
 org.apache.spark.util.collection

So how do I read such CompactBuffer from an objectFile? Thank you!


Solution

  • Looking at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala we can see that CompactBuffer is a sub-class of Seq, so try val input = sc.objectFile[(String, Seq[Person])]("inputFile.txt")