I am trying to apply a function to cartesian RDDs. The function is taken from here and I have no idea how to make it work on cartesian RDDs.
val combined = rdd_valid.cartesian(rdd1)
combined.collect().foreach(a => println(a))
(abcde,abdce)
(somethin,somthing)
(afghr, decsvt)
My first thought was to do
val newRDD = combined.map(Levenshtein.distance)
But it doesn't work.
Assuming combined
has the type RDD[(String, String)]
, and Levenshtein.distance
has this signature:
def distance(s1:String, s2:String)
You can apply it as follows:
val newRDD = combined.map { case (s1, s2) => Levenshtein.distance(s1, s2) }
Or, alternatively:
val newRDD = combined.map(t => Levenshtein.distance(t._1, t._2))