I am currently working on Spark framework using Scala. I have an array C: Array[(int, int)] to which i want to sort according to order of a map A:Map[int,int]. The code is given below:
line 1:import scala.util.Sorting
line 2: object Test {
line 3: def main(args: Array[String]){
line 4: val A = Map(5 -> 41, 1 -> 43, 2 -> 41, 3 -> 59,4 -> 51 )
line 5: val B= A.toList.sortBy(x => x._2)
line 6: B.foreach(println)
line 7: val C=Array((1,15),(2,9),(3,6),(4,3),(5,4))
line 8: Sorting.quickSort(C)(Ordering[(Int)].on(k => (A.get(k._1).get)))
line 9: for(j <- 0 to C.length-1){
line 10: println(C(j))
line 11: }
line 12: }
line 13: }
When I sort the A according the value part (line 5) and print, i get the output (line 6) like this:
(5,41)
(2,41)
(1,43)
(4,51)
(3,59)
But when i sort C according the key part of A (line 8), i get the following output (line 9-11)
(2,9)
(5,4)
(1,15)
(4,3)
(3,6)
As both (5,41) and (2,41) have same value and, in B (5,41) appears before (2,41). But after sorting C according to key part of A, why 2 is appearing before 5. Why is the result not uniform in both B and C?
Please suggest me the changes in the code to get a uniform result in both B and C. I want the C as: (5,4) (2,9) (1,15) (4,3) (3,6)
.
Map
is by-design unordered hence sorting by value
alone may result in nondeterministic order. If your business logic allows changing of the sorting criteria, consider making it consistent by ordering by value + key
on both the Map
and Array
:
import scala.util.Sorting
val A = Map(5 -> 41, 1 -> 43, 2 -> 41, 3 -> 59, 4 -> 51 )
val B = A.toList.sortBy(x => (x._2, x._1))
B
// res1: List[(Int, Int)] = List((2,41), (5,41), (1,43), (4,51), (3,59))
val C = Array((1,15),(2,9),(3,6),(4,3),(5,4))
Sorting.quickSort(C)(Ordering[(Int, Int)].on( k => (A.get(k._1).get, k._1) ))
C
// res23: Array[(Int, Int)] = Array((2,9), (5,4), (1,15), (4,3), (3,6))
As a side note, to avoid crash due to key not found, consider using gerOrElse
with a default (e.g. Int.MaxValue, etc):
Sorting.quickSort(C)(Ordering[(Int, Int)].on( k => (A.getOrElse(k._1, Int.MaxValue), k._1) ))