Search code examples
scalaapache-sparkspark-graphx

Spark Graphx inDegrees Sorting - sortBy Vs sortWith


I am trying to sort the vertex list based on in-degrees in a Spark Graph (using Scala)

// Sort Ascending - both the 2 below yeild same results

gGraph.inDegrees.collect.sortBy(_._2).take(10)

gGraph.inDegrees.collect.sortWith(_._2 < _._2).take(10)

// Sort Decending 

gGraph.inDegrees.collect.sortWith(_._2 > _._2).take(10)

gGraph.inDegrees.collect.sortBy(_._2, ascending=false).take(10)     //Doesnt Work!!

I expect the results of sortBy(_._2, ascending=false) to be same as the sortWith(_._2>_._2) as mentioned above. But getting the below error. Appreciate any thoughts around this. Thanks!

scala> gGraph.inDegrees.collect.sortBy(_.2, ascending=false).take(10) :55: error: too many arguments for method sortBy: (f: ((org.apache.spark.graphx.VertexId, Int)) => B)(implicit ord: scala.math.Ordering[B])Array[(org.apache.spark.graphx.VertexId, Int)] gGraph.inDegrees.collect.sortBy(._2, ascending=false).take(10)


Solution

  • Since you are doing .collect first, you are calling .sortBy on an Array, not on an RDD. Array's sortBy method takes only one parameter (you can't specify ascending).

    You should usually let spark handle as much of the computation as possible, and only collect (or take) at the very end. Try this:

    gGraph.inDegrees.sortBy(_._2, ascending=false).take(10)