Search code examples
scalaapache-sparkspark-graphx

Connecting the first two nodes with an edge from two RDDs in GraphX


I am using GraphX for the first time and I want to build a Graph incrementally. So I need to connect the first two nodes to an edge knowing that I have 2 RDDs (each one has a single value):

firstRDD: RDD[((Int, Array[Int]), ((VertexId, Array[Int]), Int))]
secondRDD: RDD[((Int, Array[Int]), ((VertexId, Array[Int]), Int))]  

I want to connect the first VertexId with the second one. I appreciate your help


Solution

  • Basically, you use map and case statements to pick out the VertexIds, then, use RDD.zip to stitch them together, then another map to create the final EdgeRDD:

    firstRDD.map{ 
      case ((junk1,junk2), ((vertex1, junk3), junk4)) => vertex1
    }.zip(
      secondRDD.map{
        case ((junk1,junk2), ((vertex2, junk3), junk4)) => vertex2 
      }
    ).map{ case(vertex1, vertex2) => Edge(vertex1, vertex2, 0) }