I have a dataframe having two array column. I am trying to merge this two column into one single column by merging each value sep by :. E.g in below example subject and mark should be merged and form a column of string type which will have values like [eng:40,math:20]. Can someone give some pointer here
import spark.implicits._
val columns=Array("id", "subject","mark")
val df1=sc.parallelize(Seq(
(1, Array("eng","math"),Array("10","20"))
)).toDF(columns: _*)
df1.printSchema
df1.show()
id,newcol
1,[eng:40,math:20]
Check below code.
df1.selectExpr(
"id",
"""
TRANSFORM(
ARRAYS_ZIP(subject, mark),
e -> CONCAT( e.subject, ':', e.mark )
) as newcol
"""
)
.show(false)
+---+-----------------+
|id |newcol |
+---+-----------------+
|1 |[eng:10, math:20]|
+---+-----------------+
OR
val newColExpr = transform(
arrays_zip($"subject", $"mark"),
e => concat(e.getItem("subject"), lit(":"), e.getItem("mark"))
).as("newcol")
df1.select($"id", newColExpr).show(false)