Search code examples
scalahadoopapache-sparkdatasetrdd

value toDS is not a member of org.apache.spark.rdd.RDD


I am trying to write sample Apache Spark program that converts RDD to Dataset. But in that process, I am getting compile time error.

Here is my sample code and error:

code:

import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.sql.Dataset

object Hello {

  case class Person(name: String, age: Int)

  def main(args: Array[String]){
    val conf = new SparkConf()
      .setAppName("first example")
      .setMaster("local")
    val sc = new SparkContext(conf)
    val peopleRDD: RDD[Person] = sc.parallelize(Seq(Person("John", 27)))
    val people = peopleRDD.toDS
  }
}

and my error is:

value toDS is not a member of org.apache.spark.rdd.RDD[Person]

I have added Spark core and spark SQL jars.

and my versions are:

Spark 1.6.2

scala 2.10


Solution

  • Spark version < 2.x

    toDS is available with sqlContext.implicits._

    val sqlContext = new SQLContext(sc);
    import sqlContext.implicits._
    val people = peopleRDD.toDS()
    

    Spark version >= 2.x

    val spark: SparkSession = SparkSession.builder
      .config(conf)
      .getOrCreate;
    
    import spark.implicits._
    val people = peopleRDD.toDS()
    

    HIH