Search code examples
scalasortingapache-sparkrdd

spark RDD sort by two values


I have a RDDof (name:String, popularity:Int, rank:Int). I want to sort this by rank and if rank matches then by popularity. I am doing so by two transformations.

var result = myRDD
        .sortBy(_._2, ascending = false)
        .sortBy(_._3, ascending = false)
        .take(10)

Can I do the it in one transformation?


Solution

  • You can try make an RDD of key value where key will be Tuple composed from rank and popularity and value will be name and sort by the key.

    For example:

    // _._1 - name

    // _._2 - popularity

    // _._3 - rank

    var tupledRDD = myRDD.map(line => ((line._3, line._2), line._1))
    .sortBy(_._1, ascending=false)
    .take(10)