Search code examples
apache-sparkkuberneteskubernetes-operator

Kubernetes Spark Operator Cannot Find JAR File in Image


I’m trying to deploy a SparkApplication using the Kubernetes Spark Operator. I built a custom Docker image for my Spark job, and I’m encountering an issue where the driver pod cannot find the JAR file that’s supposed to be included in the image.

Here’s my Dockerfile:

FROM bitnami/spark:3.5.3

WORKDIR /opt/spark/work-dir

COPY target/scala-2.12/app.jar /opt/spark/work-dir/

USER root
RUN chmod 777 /opt/spark/work-dir/app.jar

EXPOSE 8080

I built and pushed the image using these commands:

docker buildx build --platform=linux/amd64 -t repo/image:TAG .
docker push repo/image:TAG

When I inspect the image locally using:

docker run --rm -it repo/image:TAG /bin/bash

I can see that the JAR file exists in the expected directory:

-> pwd
/opt/spark/work-dir
-> ls
app.jar

Next, I deploy the Spark application using this YAML file:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: $APP
  namespace: $NAMESPACE
spec:
  type: Scala
  mode: cluster
  image: repo/image:TAG
  imagePullPolicy: Always
  mainClass: com.org.app.api.Api
  mainApplicationFile: "local:///opt/spark/work-dir/app.jar"
  sparkVersion: "3.5.3"
  driver:
    cores: 2
    memory: "2G"
    serviceAccount: spark
  executor:
    cores: 4
    instances: 2
    memory: "4G"
  sparkConf:
    "spark.kubernetes.container.image.pullPolicy": "Always"
    "spark.kubernetes.namespace": "namespace-name"

However, when I describe the driver pod or check its logs, I see the following error:

Files local:///opt/spark/work-dir/app.jar from /opt/spark/work-dir/app.jar to /opt/spark/work-dir/app.jar
Exception in thread "main" java.nio.file.NoSuchFileException: /opt/spark/work-dir/app.jar

Things I've tried

  1. Verified that the JAR file exists inside the Docker image (docker run confirms this).
  2. Ensured that mainApplicationFile in the SparkApplication YAML points to the correct path (local:///opt/spark/work-dir/app.jar).
  3. Built the image using the linux/amd64 platform to avoid architecture mismatches (my Kubernetes cluster runs on AMD64 nodes).
  4. Used imagePullPolicy: Always to ensure Kubernetes pulls the latest image.
  5. Built the image with an ENTRYPOINT.

Solution

  • I resolved the issue by changing the directory where I copy the Spark application JAR file to /opt/bitnami/spark/examples/jars/.

    Here’s the updated Dockerfile:

    FROM bitnami/spark:3.5.3
    
    COPY target/scala-2.12/qupid-deequ-assembly-0.1.0-SNAPSHOT.jar /opt/bitnami/spark/examples/jars/
    
    USER root
    RUN chmod -R 777 /opt
    
    EXPOSE 8080
    

    It appears that the Spark Operator or the Bitnami Spark image has a specific default configuration or expected location for application JAR files, which is /opt/bitnami/spark/examples/jars/. After making this change, the driver was able to locate the JAR file without any issues.

    If anyone has insights into why this particular directory is required or documented as a default for Spark applications in this image, I’d be interested to learn more!