Search code examples
javascaladockerapache-sparkkubernetes

Spark Kubernetes - FileNotFoundException when copying config files from driver to executors using --files or spark.files


We are migrating our Spark workloads from Cloudera to Kubernetes.

For demo purposes, we wish to run one of our spark jobs within a minikube cluster using spark-submit in cluster mode.

I would like to pass a typesafe config file to my executors using the spark.file conf (I tried --files as well). The configuration file has been copied to the spark docker image at build time at the /opt/spark/conf directory.

Yet when I submit my job, I have a java.io.FileNotFoundException: File file:/opt/spark/conf/application.conf does not exist.

My understanding is that spark.files copies the files from driver to executors' working directory.

Am I missing something ? Thank you for your help.

Here is my spark-submit command

spark-submit \
        --master k8s://https://192.168.49.2:8443 \
        --driver-memory ${SPARK_DRIVER_MEMORY} --executor-memory ${SPARK_EXECUTOR_MEMORY} \
        --deploy-mode cluster \
        --class "${MAIN_CLASS}" \
        --conf spark.driver.defaultJavaOptions="-Dconfig.file=local://${POD_CONFIG_DIR}/application.conf $JAVA_ARGS" \
        --conf spark.files="file:///${POD_CONFIG_DIR}/application.conf,file:///${POD_CONFIG_DIR}/tlereg.properties" \
        --conf spark.executor.defaultJavaOptions="-Dconfig.file=local://./application.conf" \
        --conf spark.executor.instances=5 \
        --conf spark.kubernetes.container.image=$SPARK_CONTAINER_IMAGE \
        --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
        --conf spark.kryoserializer.buffer.max=512M \
        --conf spark.driver.maxResultSize=8192M \
        --conf spark.kubernetes.authenticate.caCertFile=$HOME/.minikube/ca.crt \
        --conf spark.executor.extraClassPath="./" \
        local:///path/to/uber/jar.jar \
        "${PROG_ARGS[@]}" > $LOG_FILE 2>&1

Solution

  • I've figured it out. spark-submit sends a request to kubernetes master's api-server to create a driver pod. A configmap volume is mounted to the driver's pod at mountPath: /opt/spark/conf, which overrides my config files located at that path in the docker container. Workaround : editing /opt/spark/conf to /opt/spark/config in Dockerfile so that my configuration files are copied from the latter.