I have the following table:
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
cols = [ 'a1', 'a2']
data = [([2, 3], [4, 5]),
([1, 3], [2, 4])]
df = spark.createDataFrame(data, cols)
df.show()
# +------+------+
# | a1| a2|
# +------+------+
# |[2, 3]|[4, 5]|
# |[1, 3]|[2, 4]|
# +------+------+
I know how to multiply array by a scalar. But how to multiply members of one array with corresponding members of another array?
Desired result:
# +------+------+-------+
# | a1| a2| res|
# +------+------+-------+
# |[2, 3]|[4, 5]|[8, 15]|
# |[1, 3]|[2, 4]|[2, 12]|
# +------+------+-------+
Similarly to your example, you can access the 2nd array from the transform function. This assumes that both arrays have same length:
from pyspark.sql.functions import expr
cols = [ 'a1', 'a2']
data = [([2, 3], [4, 5]),
([1, 3], [2, 4])]
df = spark.createDataFrame(data, cols)
df = df.withColumn("res", expr("transform(a1, (x, i) -> a2[i] * x)"))
# +------+------+-------+
# | a1| a2| res|
# +------+------+-------+
# |[2, 3]|[4, 5]|[8, 15]|
# |[1, 3]|[2, 4]|[2, 12]|
# +------+------+-------+