Search code examples
scalaapache-sparkapache-spark-sqlrdd

Creating data frame out of sequence using toDF method in Apache Spark


I use Spark 2.4.4 and try to get a data frame given below.

val spark =  SparkSession
            .builder
            .master("local[*]")
            .appName("App")
            .getOrCreate 

import spark.sqlContext.implicits._  
import spark.implicits._

val justNow = spark.sparkContext.parallelize( 
        Seq(Row("1", "One")
           ,Row("2", "Tow")
        )
).toDF

I have the above piece of code defined inside main method. But I am getting an error that toDF is not function defined in RDD. I referred other posts on stackoverflow to include the explicits to get rid of the errors. I am still getting it.

error: value toDF is not a member of org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]
possible cause: maybe a semicolon is missing before `value toDF'?
Error occurred in an application involving default arguments. 

Can someone please help. Thanks!


Solution

  • You can use the createDataFrame method instead. toDF is not suitable for RDD of Rows.

    import org.apache.spark.sql.types._
    import org.apache.spark.sql.Row
    
    val schema = StructType(Seq(StructField("col1",StringType), StructField("col2",StringType)))
    val df = spark.createDataFrame(sc.parallelize(Seq(Row("1", "One"),Row("2", "Tow"))), schema)
    
    df.show
    +----+----+
    |col1|col2|
    +----+----+
    |   1| One|
    |   2| Tow|
    +----+----+