Search code examples
scalaapache-sparkpysparkdatabricks

Spark UDF throws NullPointerException


I have spark UDF, which I call from the SQL statement. This is how I have defined and register UDF function:

UDF:

import org.apache.spark.sql.functions.udf


def udf_temp_emp (_type: String, _name: String): Int  = {

   val statement = s"select key from employee where empName = $_name"

   val key = spark.sql(statement).collect()(0).getInt(0)
   return key
}

spark.udf.register("udf_temp_emp", udf_temp_emp(_,_))

And this is how I call it in my SQL command:

select 
  udf_temp_emp(emp.type, emp.name), 
  emp.id 
from 
   empMaster

And when I run above command, it throws below exception:

SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function ($read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$13841/67716274: (string, string) => string). Caused by: NullPointerException:


Solution

  • per werner's comment (should be an answer and the selected one at that) this isn't possible as the SparkContext / SparkSession doesn't exist on the executor nodes. The only time this "works" is if you use LocalRelations (such as a converted Seq).

    Re-write your udf as a join, assuming that is the actual logic.