Search code examples
scalagraphspark-graphx

Scala - Spark : return vertex properties from particular node


I have a Graph and I want to compute the max degree. In particular the vertex with max degree I want to know all properties. This is the snippets of code:

def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {
    if (a._2 > b._2) a else b
} 

val maxDegrees : (VertexId, Int) = graphX.degrees.reduce(max)
max: (a: (org.apache.spark.graphx.VertexId, Int), b: (org.apache.spark.graphx.VertexId, Int))(org.apache.spark.graphx.VertexId, Int) 
maxDegrees: (org.apache.spark.graphx.VertexId, Int) = (2063726182,56387)

val startVertexRDD = graphX.vertices.filter{case (hash_id, (id, state)) => hash_id == maxDegrees._1}
startVertexRDD.collect()

But it returned this exception:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 145.0 failed 1 times, most recent failure: Lost task 0.0 in stage 145.0 (TID 5380, localhost, executor driver): scala.MatchError: (1009147972,null) (of class scala.Tuple2)

How can fix it?


Solution

  • I think this is the problem. Here:

    val startVertexRDD = graphX.vertices.filter{case (hash_id, (id, state)) => hash_id == maxDegrees._1}
    

    So it tries to compare some tuple like this

    (2063726182,56387)
    

    expecting something like this:

    (hash_id, (id, state))
    

    Raising a scala.MatchError because is comparing a Tuple2 of (VertextId, Int) with a Tuple2 of (VertexId, Tuple2(id, state))

    Be carefull with this as well:

    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 145.0 failed 1 times, most recent failure: Lost task 0.0 in stage 145.0 (TID 5380, localhost, executor driver): scala.MatchError: (1009147972,null) (of class scala.Tuple2)
    

    Concretely here:

    scala.MatchError: (1009147972,null)
    

    There is no degree calculated for vertice 1009147972 so when it compares could raise some problems as well.

    Hope this helps.