Search code examples
scalaapache-sparkrddkeyvaluepair

How to sustrack values when keys are the same in pairRDDs?


I have two pairRDDs (Int, BreezeDenseMatrix[Double]) and what i want is, when the keys are the same to substrack their values.

E.g. when i have

RDD_1 : (1, BreezeMatrix_a)

RDD_2: (1, BreezeMatrix_b)

wanted result: (1, BreezeMatrix_a-BreezeMatrix_b)

I tried join but what is returned is (Int, (BreezeMatrix_a, BreezeMatrix_b)) and i don't know how the second part could be transformed. I can't understand if it is a set or an array, spark is not clear to that. Any other ideas?


Solution

  • Let the result of the join be

    joinresult = (Int, (BreezeMatrix_a, BreezeMatrix_b))
    

    then give

    actualresult = joinresult.map( a => (a._1,( a._2_1 - a._2_2)))