I have a users: RDD[(Long, Vertex)]
collection of users. I want to create links between my Vertex objects. The rule is: if two Vertex have the same value in a selected property - call it prop1, then a link exists.
My problem is how to check for every pair in the same collection. If I do:
val rels = users.map(
x => users.map(y => if(x._2.prop1 == y._2.prop1){(x._1, y._1)}))
I got back an RDD[RDD[Any]]
and not a RDD[(Long, Long)]
as expected for the Graph to work
First of all you cannot start an action of a transformation from an another action or transformation not to mention create nested RDDs
. So it is simply impossible you get RDD[RDD[Any]]
.
What you need here is most likely a simple join roughly equivalent to something like this where T
is a type of the property1
:
val pairs: RDD[(T, Long)] = users.map{ case (id, v) => (v.prop1, id) }
val links: RDD[(Long, Long)] = pairs
.join(pairs) // join by a common property, equivalent to INNER JOIN in SQL
.values // drop properties
.filter{ case (v1, v2) => v1 != v2 } // filter self-links