Search code examples
scalaapache-sparkspark-graphx

Getting NoSuchMethodError when setting up Spark GraphX graph


I'm getting a similar error to the one encountered here - I can run GraphX using the spark shell, but I'm getting a NoSuchMethodError when I try to use spark-submit on a jar file. This is the line that it complains about:

val myGraph: Graph[(String, Long, String), Int] = Graph.apply(userRecords, userConnectionEdges)

which gives me the following error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.graphx.
Graph$.apply$default$4()Lorg/apache/spark/storage/StorageLevel;
        at MyProject$.main(MyProject.scala:53)
        at MyProject.main(MyProject.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The code builds using sbt assembly, so I'm not what is going wrong.

EDIT: I created a new scala project to take the code from here and built it into a jar file. This is the scala file:

/* GraphTest.scala */

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

object GraphTest {

 def main(args: Array[String]) {

    // Set up environment
    val conf = new SparkConf()
    val sc = new SparkContext(conf)

    // Set up the vertices
    val vertexArray = Array(
      (1L, ("Alice", 28)),
      (2L, ("Bob", 27)),
      (3L, ("Charlie", 65)),
      (4L, ("David", 42)),
      (5L, ("Ed", 55)),
      (6L, ("Fran", 50))
      )

    // Set up the edges
    val edgeArray = Array(
      Edge(2L, 1L, 7),
      Edge(2L, 4L, 2),
      Edge(3L, 2L, 4),
      Edge(3L, 6L, 3),
      Edge(4L, 1L, 1),
      Edge(5L, 2L, 2),
      Edge(5L, 3L, 8),
      Edge(5L, 6L, 3)
      )

    // Convert arrays to RDDs
    val vertexRDD: RDD[(Long, (String, Int))] = sc.parallelize(vertexArray)
    val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)

    // Create graph and print vertex data
    val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD)

    graph.vertices.filter { case (id, (name, age)) => age > 30 }.collect.foreach {
        case (id, (name, age)) => println(s"$name is $age")
    } 
 } 
}

Here are the build settings:

import AssemblyKeys._

assemblySettings

name := "graphtest"

version := "1.0"

scalaVersion := "2.10.3"

libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.2.1" % "provided"

I can run sbt assembly on the code, but when I run

..\spark\bin\spark-submit --class GraphTest target\scala-2.10\graphtest-assembly-1.0.jar

I get the NoSuchMethodError.


Solution

  • Turns out to be a version issue - I was using the SBT and Spark from the Databricks training, which are a few versions behind the current version. This will work with the latest version of SBT (v0.13.7), Scala (2.10.4), and Spark (1.2.1).

    After I got that working, I encountered this Spark/Hadoop/winutils.exe error. Eventually I got it all working :)