Search code examples
apache-sparkapache-spark-dataset

Extract nested objects in an object using dataset api in spark


I'm new in Spark, i'm trying the dataset api and i would like to know if it's possible to extract nested objects in an object using the dataset api.
For example, let's say i have an object of type A et an object of type B as below

case class A(a: String, b: Integer)
case class B(c: Array[A])

I have a dataset containing objects of class B : Dataset[B] I would like to apply some transformations to get all the objects of type A in my final dataset : Dataset[A]
I tried this but it does not work

bs.map(b => b.a.map(x => x))

Anyone has an idea ?

Thanks in advance


Solution

  • You may first explode the B's Array[A] into rows and cast them to DataSet[A]

    ## 'bs' Dataset 
    +--------------------------+
    |c                         |
    +--------------------------+
    |[[value1, 1], [value2, 2]]|
    +--------------------------+
    
    
    val testDF = bs.select(explode($"c"))
    
    ## 'testDF' Dataframe
    +-----------+
    |        col|
    +-----------+
    |[value1, 1]|
    |[value2, 2]|
    +-----------+
    
    
    val asDF = test_df.withColumn("a", col("col.a")).withColumn("b", col("col.b")).drop("col").as[A]
    
    ## 'asDF' Dataset
    +------+---+
    |     a|  b|
    +------+---+
    |value1|  1|
    |value2|  2|
    +------+---+