I am trying to multiply an array typed column by a scalar. This scalar is also a value from the same PySpark dataframe.
For example, I have this dataframe:
df = sc.parallelize([([1, 2],3)]).toDF(["l","factor"])
+------+------+
| l|factor|
+------+------+
|[1, 2]| 3|
+------+------+
What I want to achieve is this:
+------+------+
| l|factor|
+------+------+
|[3, 6]| 3|
+------+------+
This is what I have tried:
df.withColumn("l", lit("factor") * df.l)
It returns a type mismatch error. How can I multiply an array typed column by a number?
From spark-2.4
use transform
spark.sql(""" select l, factor, transform(l,x -> x * factor) as result from tmp """).show(10,False)
#+------+------+------+
#|l |factor|result|
#+------+------+------+
#|[1, 2]|3 |[3, 6]|
#+------+------+------+
Using dataframe API:
df.withColumn("res",expr("""transform(l,x -> x*factor)""")).show()
#+------+------+------+
#| l|factor| res|
#+------+------+------+
#|[1, 2]| 3|[3, 6]|
#+------+------+------+