Search code examples
scalaapache-sparkapache-spark-sqlrddapache-zeppelin

Spark get a column as sequence for usage in zeppelin select form


I have a dataframe from which I want to select column(s) as seq to be used in zeppelin Select form.

This is how the select form works:

basic select form example

Select form requires

required: Iterable[(Object, String)]

what I have I got is

val test_seq = data.select("file", "id").collect().map(x => (x.get(0), x.get(1).toString)).toSeq

Which is in form

found: Seq[(Any, String)]

And is not usable in the form. I have not yet figured out how do I get the the column(s) out of the dataframe in correct format.


Solution

  • You can try getting a tuple of object and string from the RDD, and use toIterable to convert to Iterable[(Object, String)]:

    val testIter = data.select("file", "id").collect().map(
        x => (x.getAs[Object](0), x.getAs[String](1))
    ).toIterable