Search code examples
scalaapache-sparkdatabricksaws-databricks

Get classname of the running Databricks Job


There is an Apache Spark Scala project (runnerProject) that uses another one in the same package (sourceProject). The aim of the source project is to get the name and version of the Databricks job that is running.

The problem with the following method is that when it is called from the runnerProject, it returns the sourceProject's details, not the runnerProject's name and version.

sourceProject's method:

class EnvironmentInfo(appName: String) {
 
override def getJobDetails(): (String, String) = {
    val classPackage = getClass.getPackage

    val jobName = classPackage.getImplementationTitle
    val jobVersion = classPackage.getImplementationVersion

    (jobName, jobVersion)
  }
 }

runnerProject uses sourceProject as a package:

import com.sourceProject.environment.{EnvironmentInfo}

 class runnerProject {
  def start(
      environment: EnvironmentInfo
  ): Unit = {

// using the returned parameters of the environment

}

How can this issue be worked around in a way that the getJobDetails() runs in the sourceProject, so that it can be called from other projects as well, not just the runnerProject. And also, it should return the details about the "caller" job.

Thank you in advance! :)


Solution

  • Try the following, it gets the calling class name from the stack trace, and uses that to get the actual class, and it's package.

    class EnvironmentInfo(appName: String) {
     
    override def getJobDetails(): (String, String) = {
        val callingClassName = Thread.currentThread.getStackTrace()(2).getClassName
        val classPackage = Class.forName(callingClassName).getPackage
    
        val jobName = classPackage.getImplementationTitle
        val jobVersion = classPackage.getImplementationVersion
    
        (jobName, jobVersion)
      }
     }
    

    It will work if you call it directly, it might give you the wrong package if you call it from within a lambda function.