Given a spark DataFrame with columns "id", "first", "last", "year"
val df=sc.parallelize(Seq(
(1, "John", "Doe", 1986),
(2, "Ive", "Fish", 1990),
(4, "John", "Wayne", 1995)
)).toDF("id", "first", "last", "year")
and case class
case class IdAndLastName(
id: Int,
last:String )
I would like to only select columns in case class which are id
and last
. In other words, I would like to have this output df.select("id","last")
by using case class. I am avoiding hardcoding the attributes. Could you please help me how can I achieve this in a compact way.
You can create explictly an encoder for the case class (usually this happens implicitly here). Then you can get the field names from the encoder and use them in the select statement:
val fieldnames = Encoders.product[IdAndLastName].schema.fieldNames
df.select(fieldnames.head, fieldnames.tail:_*).show()
Output:
+---+-----+
| id| last|
+---+-----+
| 1| Doe|
| 2| Fish|
| 4|Wayne|
+---+-----+