I have the following rdds:
case class Rating(user_ID: Integer, movie_ID: Integer, rating: Integer, timestamp: String)
case class Movie(movie_ID: Integer, title: String, genre: String)
I join them together in scala, like:
val m = datamovie.keyBy(_.movie_ID)
val r = data.keyBy(_.movie_ID)
val mr = m.join(r)
I get back my result like RDD[(Int, (Movie, Rating))]
how can I print the tile of the movies that have the rating 5 for example. I am not quit sure how to work with the new rdd that was created with the join!
Convert them to spark dataframe and perform joins. Is there a specific reason you wanted to keep em RDD's
val m = datamovie.toDF
val r = data.toDF
val mr = m.join(r, Seq("movie_id"), "left").where($"rating" === "5").select($"title")