Search code examples
pythonapache-sparkpysparkrdd

Multiply every element of RDD with corresponding element in a list


for example:

x = RandomRDDs.normalRDD(sc, size=3, seed=0)

x is like this: [-1.3, -2.4, -4.5] I want to multiply every element of x with a different number in a list [1, 2, 3] and add them to produce y. here y equals -1.3*1 + -2.4*2 + -4.5*3

but i can only do this:

y = x.map(lambda i: i*2).reduce(lambda a, b: a+b)

here y = -1.3*2 + -2.4*2 + -4.5*2

how can i replace 2 in x.map(lambda i: i*2) with a diffrent number every time?

the final effect is like what we often do in python:

x = [-1.3, -2.4, -4.5]
w = [1, 2, 3]
y = sum(x*w)

or

sum([x[i]*w[i] for i in range(len(x))])

thanks a lot!


Solution

  • I would do this using zipWithIndex and map:

    x = RandomRDDs.normalRDD(sc, size=3, seed=0)
    w = sc.broadcast([1, 2, 3])
    
    x.zipWithIndex().map(lambda v: v[0] * w.value[v[1]]).sum()
    

    Or,

    import operator
    x.zipWithIndex().map(lambda v: v[0] * w.value[v[1]]).reduce(operator.add)