scala apache-spark apache-spark-sql stack unpivot

Unpivot in spark-sql/Scala column names are numbers

I have tried the built in stack function described in this post Unpivot in spark-sql/pyspark for Scala, and works fine for each of the columns identified with a code that contains a letter but not in those columns where the code is just a number.

I have a dataframe df that looks like this

I applied as mentioned in the linked answer:

val result = df.select($"Id", expr("stack(3, '00C', 00C, '0R5', 0R5, '234', 234)"))

And the result is this one

What I want is that the value of the row 234 was 0 as it should be.

Solution

Because 234 is number & In SQL, If you select any number It will return same number as value, You need to tell compiler 234 is column name not number, to do that you have to use backtick (`) around the number i.e `234`.

Check below code.

scala> val df = Seq(("xyz",0,1,0)).toDF("Id","00C","0R5","234")
df: org.apache.spark.sql.DataFrame = [Id: string, 00C: int ... 2 more fields]

scala> df.select($"Id", expr("stack(3, '00C', 00C, '0R5', 0R5, '234',`234`)")).show(false)
+---+----+----+
|Id |col0|col1|
+---+----+----+
|xyz|00C |0   |
|xyz|0R5 |1   |
|xyz|234 |0   |
+---+----+----+