I have a dataframe that looks like this
| Column |
|------------------------------------------------|
|[{a: 2, b: 4}, {a: 2, b: 3}] |
|------------------------------------------------|
|[{a: 12, b: 14}, {a: 25, b: 33}, {a: 22, b: 31}]|
...
And I need to convert it to dataframe like
| a | b |
|---|---|
| 2 | 4 |
| 2 | 3 |
|12 |13 |
Simplest approach might be to use SparkSQL function inline
as shown below:
case class AB(a: Int, b: Int)
val df = Seq(
Seq(AB(2, 4), AB(2,3)),
Seq(AB(12, 14), AB(25, 33), AB(22, 31))
).toDF("arrAB")
df.select(inline($"arrAB")).show
/*
+---+---+
| a| b|
+---+---+
| 2| 4|
| 2| 3|
| 12| 14|
| 25| 33|
| 22| 31|
+---+---+
*/
Note that while inline
has been part of the SparkSQL API since 2.0
, it's available as a built-in function for Dataframes only on Spark 3.4+
. To use it on older Spark versions, wrap it with expr
like below:
df.select(expr("inline(arrAB)"))