Search code examples
apache-sparkhadoop-yarnapache-zeppelin

Spark (Yarn) applications started by Zeppelin in Yarn Cluster Mode aren't killed after zeppein is stopped


I'm running Zeppelin 0.8.1 and have configured it to submit Spark jobs to a Yarn 2.7.5 cluster, with interpreters both in cluster-mode (as in the AM is running on yarn, and not on driver host), and in client-mode.

The yarn applications started in client mode are immediately killed after I stop the Zeppelin server. But, the jobs started in cluster mode become zombie-like, and start hogging all the resources in the Yarn cluster (No dynamic resource allocation).

Is there a way to make zeppelin kill those jobs upon exit? Or anything that solves this problem?


Solution

  • Starting from version 0.8, Zeppelin provides a parameter to shutdown idle interpreters by setting zeppelin.interpreter.lifecyclemanager.timeout.threshold.

    See Interpreter Lifecycle Management

    Before this I used a simple shell script that checks the running applications on yarn and kills them if idle for more than 1 hour:

    max_life_in_mins=60
    
    zeppelinApps=`yarn application -list 2>/dev/null | grep "RUNNING" | grep "Zeppelin Spark Interpreter" | awk '{print $1}'`
    
    for jobId in $zeppelinApps
    do
        finish_time=`yarn application -status $jobId 2>/dev/null | grep "Finish-Time" | awk '{print $NF}'`
        if [ $finish_time -ne 0 ]; then
          echo "App $jobId is not running"
          exit 1
        fi
    
        time_diff=`date +%s`-`yarn application -status $jobId 2>/dev/null | grep "Start-Time" | awk '{print $NF}' | sed 's!$!/1000!'`
        time_diff_in_mins=`echo "("$time_diff")/60" | bc`
    
        if [ $time_diff_in_mins -gt $max_life_in_mins ]; then
          echo "Killing app $jobId"
          yarn application -kill $jobId
        fi
    done 
    

    There is also yarn REST API to do the same thing.