Search code examples
dataframeapache-sparkspark-shell

Where toDF in Spark-shell, how to use with Vector, Seq or other?


I try some basic data types,

val x = Vector("John Smith", 10, "Illinois")
val x = Seq("John Smith", 10, "Illinois")
val x = Array("John Smith", 10, "Illinois")
val x = ...
val x = Seq( Vector("John Smith",10,"Illinois"), Vector("Foo",2,"Bar"))

but no one offer toDF(), even after import spark.implicits._.

My aim is to use someting as x.toDF("name","age","city").show

In the last example the toDF exists, but error "java.lang.ClassNotFoundException".


NOTES:

  • I am using Spark-shell with Spark v2.2.

  • Need generic transformation based on colunm names parametrized in toDF(names), not complex solutions as create Vector of case class Person(name: String, age: Long, city: String)

Expected result of show after toDF is

+----------+---+--------+
|      name|age|    city|
+----------+---+--------+
|John Smith| 10|Illinois|
+----------+---+--------+

Solution

  • you should put values in tuple to create 3 columns

    scala> Seq(("John Smith", "asd", "Illinois")).toDF("name","age","city").show
    +----------+---+--------+
    |      name|age|    city|
    +----------+---+--------+
    |John Smith|asd|Illinois|
    +----------+---+--------+