Search code examples
apache-sparkapache-zeppelin

How Apache Zeppelin computes Spark job progress bar?


When starting spark job from Apache Zeppelin notebook interface it shows you a progress bar of job execution. But what does this progress actually mean? Sometimes it shrinks or expands. Is it a progress of current stage or a whole job?


Solution

  • In the web interface, the progress bar is showing the value returned by the getProgress function (not implemented for every interpeters, such as python).

    This function returns a percentage.

    When using the Spark interpreter, the value seems to be the percentage of tasks done (Calling the following progress function from JobProgressUtil) :

    def progress(sc: SparkContext, jobGroup : String):Int = {
        val jobIds = sc.statusTracker.getJobIdsForGroup(jobGroup)
        val jobs = jobIds.flatMap { id => sc.statusTracker.getJobInfo(id) }
        val stages = jobs.flatMap { job =>
          job.stageIds().flatMap(sc.statusTracker.getStageInfo)
        }
    
        val taskCount = stages.map(_.numTasks).sum
        val completedTaskCount = stages.map(_.numCompletedTasks).sum
        if (taskCount == 0) {
          0
        } else {
          (100 * completedTaskCount.toDouble / taskCount).toInt
        }
    }
    

    Meanwhile, I could not find it specified in the Zeppelin documentation.