Search code examples
dataframeapache-sparkpivot-tablesparse-matrixunpivot

Spark DataFrame: converting 144 columns into row


Env: Spark 2.4.0; Scala

I have created DF from CSV that has with 144 columns. Is there anyway to change all columns except one into row?

Table A 
|dt  |AA|BB|CC|     
|----|--|--|--|
|1012|10|12|13|        
|1013|13|14|15|
|1014|14|18|30|


Table B (After transform Table A) 

|dt  |Head|Val|
|----|----|---
|1012|AA  |12|
|1013|AA  |13|
|1014|AA  |14|
|1012|BB  |12|
|1013|BB  |14|
|1014|BB  |18|
|1012|CC  |13|
|1013|CC  |15|
|1014|CC  |30|

I need transpose/ UnPivot table A to Table B. Please note, Table A has 144 columns. I thought built-in function stack(n, expr1, ..., exprk) but I don't know how to pass so many columns automatically.

Appreciating your time and effort to help.


Solution

  • You can create the parameter list for stack dynamically using Scala string operations:

    val dfA = Seq((1012, 10, 12, 13), (1013, 13, 14, 15), (1014, 14, 18, 30)).toDF("dt", "AA", "BB", "CC")
    
    val columns = dfA.columns.filter(!_.equalsIgnoreCase("dt"))
    var cmd = s"stack(${columns.length},"
    for( col <- columns) cmd += s"'$col',$col,"
    cmd = cmd.dropRight(1) + ")"
    
    val dfB = dfA.selectExpr("dt", cmd)
      .withColumnRenamed("col0", "Head")
      .withColumnRenamed("col1", "Val")
    

    Result:

    +----+----+---+
    |  dt|Head|Val|
    +----+----+---+
    |1012|  AA| 10|
    |1012|  BB| 12|
    |1012|  CC| 13|
    |1013|  AA| 13|
    |1013|  BB| 14|
    |1013|  CC| 15|
    |1014|  AA| 14|
    |1014|  BB| 18|
    |1014|  CC| 30|
    +----+----+---+