Search code examples
scalaapache-sparkscalatest

What causes a NullPointerException when a SharedSparkContext (sc) is used outside a test function in FunSuite?


The following scala code works fine, and the test runs:

import org.scalatest._
import com.holdenkarau.spark.testing._

class DummyTest extends FunSuite with SharedSparkContext {
   test("shared context only works inside test functions.") {
     val myRDD = sc.parallelize(List(1,2,3,4))
   }
}

However, the following scala code results in a java.lang.NullPointerException on the line sc.parallelize:

import org.scalatest._
import com.holdenkarau.spark.testing._

class DummyTest extends FunSuite with SharedSparkContext {
   val myRDD = sc.parallelize(List(1,2,3,4))
   test("shared context only works inside test functions.") {
      assert(true)
   }
}

What causes the NullPointerException when the SparkContext is used outside of the test function?


Solution

  • The SparkContext is declared within SharedSparkContext but not initialized as part of that trait's initialization. Rather it is initialized in the trait's beforeAll() method, which is called by the test framework after the suite has been fully instantiated. Source is here: https://github.com/holdenk/spark-testing-base/blob/master/src/main/pre-2.0/scala/com/holdenkarau/spark/testing/SharedSparkContext.scala. If you use it while initializing your class, beforeAll() has not yet been called, so it is still null.

    So to summarize, the order is:

    1. Super-class initialization (code just in the trait body)
    2. Sub-class initialization (code just in your class's body)
    3. beforeAll() called
    4. tests run

    So you can use sc in step 4 but not in step 2.