I'm new in Spark, i'm trying the dataset api and i would like to know if it's possible to extract nested objects in an object using the dataset api.
For example, let's say i have an object of type A et an object of type B as below
case class A(a: String, b: Integer)
case class B(c: Array[A])
I have a dataset containing objects of class B : Dataset[B]
I would like to apply some transformations to get all the objects of type A in my final dataset : Dataset[A]
I tried this but it does not work
bs.map(b => b.a.map(x => x))
Anyone has an idea ?
Thanks in advance
You may first explode the B's Array[A]
into rows and cast them to DataSet[A]
## 'bs' Dataset
+--------------------------+
|c |
+--------------------------+
|[[value1, 1], [value2, 2]]|
+--------------------------+
val testDF = bs.select(explode($"c"))
## 'testDF' Dataframe
+-----------+
| col|
+-----------+
|[value1, 1]|
|[value2, 2]|
+-----------+
val asDF = test_df.withColumn("a", col("col.a")).withColumn("b", col("col.b")).drop("col").as[A]
## 'asDF' Dataset
+------+---+
| a| b|
+------+---+
|value1| 1|
|value2| 2|
+------+---+