Search code examples
scalaapache-sparkrddscala-2.10

How to Sum a part of a list in RDD


I have an RDD, and I would like to sum a part of the list.

(key, element2 + element3)
(1, List(2.0, 3.0, 4.0, 5.0)), (2, List(1.0, -1.0, -2.0, -3.0))

output should look like this,

(1, 7.0), (2, -3.0)

Thanks


Solution

  • You can map and indexing on the second part:

    yourRddOfTuples.map(tuple => {val list = tuple._2; list(1) + list(2)})
    

    Update after your comment, convert it to Vector:

    yourRddOfTuples.map(tuple => {val vs = tuple._2.toVector; vs(1) + vs(2)})
    

    Or if you do not want to use conversions:

    yourRddOfTuples.map(_._2.drop(1).take(2).sum)
    

    This skips the first element (.drop(1)) from the second element of the tuple (.map(_._2), takes the next two (.take(2)) (might be less if you have less) and sums them (.sum).