To start with this is bit of context: in my cluster kubernetes there is spark app that is running and I want to add a deployment to start the spark history server that will read the logs generated by that app on a shared volume.
For some security measure in the project I can't use image of spark operator directly in my dockerfile. So I install spark via a conda env and pyspark in my dockerfile. I also export the env var ENV SPARK_HISTORY_OPTS instead of the config file as they should be the same.
SPARK_HISTORY_OPTS='-Dspark.history.fs.logDirectory=/execution-events -Dspark.eventLog.dir=/execution-events -Dspark.eventLog.enabled=true -Dspark.history.fs.cleaner.enabled=true -Dspark.history.ui.port=4039'
the shared volume that is mount on the deployment has the same path /execution-event
In my custom entrypoint.sh file there is a few steps,
- export the spark home
- start the spark history server with a simple: exec /usr/bin/tini -s -- $SPARK_HOME/sbin/start-history-server.sh
When I watch the deployment being created, the pod starts the server but then it die on the completed state and restarts in CrashLoopBackOff which is something I don't understand.
The spark history server should stay alive until I execute the stop-history-server.sh script, so why can't it stay alive ?
Thank for the futur answers.
PS: When I add a sleep of around 5 mins to debug and manually in ssh the pod, and start the server I can see the message: spark history server starts.
And I can see in the logs folder that the files are created.
This is the message in log of the pod:
+ exec /usr/bin/tini -s -- /opt/conda/envs/spark-env-3.1.2/lib/python3.7/site-packages/pyspark/sbin/start-history-server.sh │
│ starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/conda/envs/spark-env-3.1.2/lib/python3.7/site-packages/pyspark/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-spark-histor │
│ Stream closed EOF for ***NAMESPACE***/spark-history-deployment-65dd4dd6f5-wk27t (spark-history-container)
The problem was something I found recently, in the entrypoint.sh file where I start the spark-history-server.sh script I need to set a env var used by the daemon script to not be used in background but in foreground to keep the pod alive.
export SPARK_NO_DAEMONIZE=false
Hope it will help futur guys/girls with same the problem.