Search code examples
scalaapache-sparkapache-spark-sqlstackunpivot

Unpivot in spark-sql/Scala column names are numbers


I have tried the built in stack function described in this post Unpivot in spark-sql/pyspark for Scala, and works fine for each of the columns identified with a code that contains a letter but not in those columns where the code is just a number.

I have a dataframe df that looks like this

I applied as mentioned in the linked answer:

val result = df.select($"Id", expr("stack(3, '00C', 00C, '0R5', 0R5, '234', 234)"))

And the result is this one

What I want is that the value of the row 234 was 0 as it should be.


Solution

  • Because 234 is number & In SQL, If you select any number It will return same number as value, You need to tell compiler 234 is column name not number, to do that you have to use backtick (`) around the number i.e `234`.

    Check below code.

    scala> val df = Seq(("xyz",0,1,0)).toDF("Id","00C","0R5","234")
    df: org.apache.spark.sql.DataFrame = [Id: string, 00C: int ... 2 more fields]
    
    scala> df.select($"Id", expr("stack(3, '00C', 00C, '0R5', 0R5, '234',`234`)")).show(false)
    +---+----+----+
    |Id |col0|col1|
    +---+----+----+
    |xyz|00C |0   |
    |xyz|0R5 |1   |
    |xyz|234 |0   |
    +---+----+----+