Search code examples
arraysapache-sparkpysparkapache-spark-sqlmultiplication

Multiplication of members of two arrays


I have the following table:

from pyspark.sql import SparkSession, functions as F

spark = SparkSession.builder.getOrCreate()

cols = [  'a1',   'a2']
data = [([2, 3], [4, 5]),
        ([1, 3], [2, 4])]

df = spark.createDataFrame(data, cols)
df.show()
#  +------+------+
#  |    a1|    a2|
#  +------+------+
#  |[2, 3]|[4, 5]|
#  |[1, 3]|[2, 4]|
#  +------+------+

I know how to multiply array by a scalar. But how to multiply members of one array with corresponding members of another array?

Desired result:

#  +------+------+-------+
#  |    a1|    a2|    res|
#  +------+------+-------+
#  |[2, 3]|[4, 5]|[8, 15]|
#  |[1, 3]|[2, 4]|[2, 12]|
#  +------+------+-------+

Solution

  • Similarly to your example, you can access the 2nd array from the transform function. This assumes that both arrays have same length:

    from pyspark.sql.functions import expr
    
    cols = [  'a1',   'a2']
    data = [([2, 3], [4, 5]),
            ([1, 3], [2, 4])]
    
    df = spark.createDataFrame(data, cols)
    
    df = df.withColumn("res", expr("transform(a1, (x, i) -> a2[i] * x)"))
    
    # +------+------+-------+
    # |    a1|    a2|    res|
    # +------+------+-------+
    # |[2, 3]|[4, 5]|[8, 15]|
    # |[1, 3]|[2, 4]|[2, 12]|
    # +------+------+-------+