I have the specific Dataframe
+-------+----------------------+
|BlockId|Entity_BlockNumberList|
+-------+----------------------+
| 1| [[1, 4], [3, 5]]|
| 2| [[1, 4], [3, 5]]|
| 3| [[2, 4], [4, 4]]|
| 4| [[2, 4], [4, 4]]|
| 5| [[2, 4], [3, 5]]|
| 6| [[3, 5], [5, 3]]|
| 7| [[1, 4], [4, 4]]|
| 8| [[3, 5], [4, 4], ...|
| 9| [[1, 4], [2, 4], ...|
+-------+----------------------+
I want to create multiple columns for each element of the nested arrays in the 2nd Column Something like that
BlockId | Entity_BlockNumberList | 1st Array | 2nd Array | ...
Something like explode but in Columns by using Javadoc I have found this code on the internet
val numCols = df
.withColumn("letters_size", size($"letters"))
.agg(max($"letters_size"))
.head()
.getInt(0)
df
.select(
(0 until numCols).map(i => $"letters".getItem(i).as(s"col$i")): _*
)
.show()
which is in scala but i cant quite get how to do the select func especially the (0 until numCols) by using Java.
example: input:
+---------+
| letters|
+---------+
|[a, b, c]|
|[d, e, f]|
| null|
+---------+
expected output:
+----+----+----+
|col0|col1|col2|
+----+----+----+
| a| b| c|
| d| e| f|
|null|null|null|
+----+----+----+
But not hardcoded because my arrays dont have the same length.
I tried that
df.selectExpr(df.select(
expr("concat('struct(',concat_ws(',',transform(sequence(0,max(size(entities))-1),x -> concat('entities[',x,'] as col',x))),') as columns')")
).as(Encoders.STRING()).head()).show();
The output was the follow:
+------------+
| columns|
+------------+
| [1, 3,,]|
| [1, 3,,]|
| [2, 4,,]|
| [2, 4,,]|
| [2, 3,,]|
| [3, 5,,]|
| [1, 4,,]|
|[3, 4, 5, 6]|
| [1, 2, 5,]|
+------------+
The best I could do to achieve what I wanted that works and in nested arrays is do a for like that.
for (int i; i < numCols; i++) {
df = df.withColumn("c" + i, df.col("entities").getItem(i));
}
The output:
+-----+------------+---+---+----+----+
|block| entities| c0| c1| c2| c3|
+-----+------------+---+---+----+----+
| 1| [1, 3]| 1| 3|null|null|
| 2| [1, 3]| 1| 3|null|null|
| 3| [2, 4]| 2| 4|null|null|
| 4| [2, 4]| 2| 4|null|null|
| 5| [2, 3]| 2| 3|null|null|
| 6| [3, 5]| 3| 5|null|null|
| 7| [1, 4]| 1| 4|null|null|
| 8|[3, 4, 5, 6]| 3| 4| 5| 6|
| 9| [1, 2, 5]| 1| 2| 5|null|
+-----+------------+---+---+----+----+
Kinda worked for what i wanted to do next.