Search code examples
scalaapache-sparkapache-spark-sql

Not able to Explode and select in the same expression in spark scala


Here is my schema:

root
 |-- DataPartition: string (nullable = true)
 |-- TimeStamp: string (nullable = true)
 |-- TRFCoraxData_instrumentId: long (nullable = true)
 |-- TRFCoraxData_organizationId: long (nullable = true)
 |-- Dividends: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- cr:AnnouncementDate: string (nullable = true)
 |    |    |-- cr:CorporateActionAdjustedDividendGrossAmount: double (nullable = true)
 |    |    |-- cr:CorporateActionAdjustedDividendNetAmount: double (nullable = true)
 |    |    |-- cr:CurrencyId: long (nullable = true)
 |    |    |-- cr:DividendEventId: long (nullable = true)
 |    |    |-- cr:DividendGrossAmount: double (nullable = true)
 |    |    |-- cr:DividendNetAmount: double (nullable = true)
 |    |    |-- cr:DividendType: string (nullable = true)
 |    |    |-- cr:ExDate: string (nullable = true)
 |    |    |-- cr:PayDate: string (nullable = true)
 |    |    |-- cr:PeriodDuration: string (nullable = true)
 |    |    |-- cr:PeriodEndDate: string (nullable = true)
 |    |    |-- cr:RecordDate: string (nullable = true)
 |-- FFAction|!|: string (nullable = true)

I want to explode and select all columns in the same expression so that I need not have to write with Column or Select by giving the column name separately .

Here is my code where i am exploding

 val temp2 = temp1.select(getDataPartition($"DataPartition").as("DataPartition"), $"TimeStamp".as("TimeStamp"), $"TRFCoraxData_instrumentId".as("TRFCoraxData_instrumentId"), $"TRFCoraxData_organizationId".as("TRFCoraxData_organizationId"),explode($"Dividends"), $"FFAction|!|".as("FFAction|!|"))
 val temp = temp2.select(temp2.columns.map(x => col(x).as(x.replace("cr:", ""))): _*)
        
temp.show(false)

And here is my output that I get where I am getting explode column as Col .

How can I get the column name also in the same expression

+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|DataPartition    |TimeStamp                |TRFCoraxData_instrumentId|TRFCoraxData_organizationId|col                                                                                                                                                                                    |FFAction|!||
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2009-07-14T00:00:00+00:00,null,0.35,500110,73014469387,0.35,null,INTE,2009-08-13T00:00:00+00:00,2009-09-15T00:00:00+00:00,P3M,2009-09-30T00:00:00+00:00,2009-08-17T00:00:00+00:00]    |O|!|       |
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2008-02-05T00:00:00+00:00,null,0.3,500110,73015860528,0.3,null,INTE,2008-02-14T00:00:00+00:00,2008-03-17T00:00:00+00:00,P3M,2008-03-31T00:00:00+00:00,2008-02-19T00:00:00+00:00]      |O|!|       |
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2008-04-29T00:00:00+00:00,null,0.3,500110,73015864496,0.3,null,INTE,2008-05-14T00:00:00+00:00,2008-06-16T00:00:00+00:00,P3M,2008-06-30T00:00:00+00:00,2008-05-16T00:00:00+00:00]      |O|!|       |
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+

Solution

  • How can I get the column name also in the same expression

    col is the column name given by spark itself for the exploded column. You can use alias as you have done for other columns if you want some other names other than col as

    explode($"Dividends").as("Dividends")
    

    and then you can expand the struct column into separate columns using .* as

    temp2.select(col("Dividends.*"))
    

    I want to explode and select all columns in the same expression so that I need not have to write with Column or Select by giving the column name separately

    Only one generator can be used with one expression.