I am trying to follow this Spark document to use the cluster mode.
I deployed Spark in a local Kubernetes at namespace hm-spark
by
helm upgrade \
spark \
spark \
--install \
--repo=https://charts.bitnami.com/bitnami \
--namespace=hm-spark \
--create-namespace \
--values=my-values.yaml
my-values.yaml
image:
registry: docker.io
repository: bitnami/spark
tag: 3.4.0-debian-11-r1
I got Kubernetes IP https://127.0.0.1:6443
by
➜ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy
Then when I submit the Spark application on my macOS by:
spark-submit \
--master=k8s://https://127.0.0.1:6443 \
--deploy-mode=cluster \
--name=spark-pi \
--class=org.apache.spark.examples.SparkPi \
--conf=spark.kubernetes.namespace=hm-spark \
--conf=spark.kubernetes.container.image=docker.io/bitnami/spark:3.4.0-debian-11-r1 \
local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.4.0.jar
The spark-pi
pod got created
➜ kubectl get pods --namespace hm-spark
NAME READY STATUS RESTARTS AGE
spark-worker-0 1/1 Running 0 82m
spark-worker-1 1/1 Running 0 82m
spark-master-0 1/1 Running 0 82m
spark-pi-ec6d2e886f483472-driver 0/1 Error 0 9m32s
However, it failed with error:
spark 00:50:00.00
spark 00:50:00.01 Welcome to the Bitnami spark container
spark 00:50:00.01 Subscribe to project updates by watching https://github.com/bitnami/containers
spark 00:50:00.01 Submit issues and feature requests at https://github.com/bitnami/containers/issues
spark 00:50:00.01
23/05/31 00:50:02 INFO SparkContext: Running Spark version 3.4.0
23/05/31 00:50:02 INFO ResourceUtils: ==============================================================
23/05/31 00:50:02 INFO ResourceUtils: No custom resources configured for spark.driver.
23/05/31 00:50:02 INFO ResourceUtils: ==============================================================
23/05/31 00:50:02 INFO SparkContext: Submitted application: Spark Pi
23/05/31 00:50:02 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/05/31 00:50:02 INFO ResourceProfile: Limiting resource is cpu
23/05/31 00:50:02 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/05/31 00:50:02 INFO SecurityManager: Changing view acls to: spark
23/05/31 00:50:02 INFO SecurityManager: Changing modify acls to: spark
23/05/31 00:50:02 INFO SecurityManager: Changing view acls groups to:
23/05/31 00:50:02 INFO SecurityManager: Changing modify acls groups to:
23/05/31 00:50:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: spark; groups with view permissions: EMPTY; users with modify permissions: spark; groups with modify permissions: EMPTY
23/05/31 00:50:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/31 00:50:02 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
23/05/31 00:50:02 INFO SparkEnv: Registering MapOutputTracker
23/05/31 00:50:02 INFO SparkEnv: Registering BlockManagerMaster
23/05/31 00:50:02 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/05/31 00:50:02 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/05/31 00:50:02 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/05/31 00:50:02 INFO DiskBlockManager: Created local directory at /var/data/spark-a1bd571d-599f-48d6-b7a9-06d35fb82cdb/blockmgr-5ee7264e-b52a-48bc-a1bf-f8f3f6a514aa
23/05/31 00:50:02 INFO MemoryStore: MemoryStore started with capacity 413.9 MiB
23/05/31 00:50:02 INFO SparkEnv: Registering OutputCommitCoordinator
23/05/31 00:50:02 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
23/05/31 00:50:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/05/31 00:50:02 INFO SparkContext: Added JAR local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.4.0.jar at file:/opt/bitnami/spark/examples/jars/spark-examples_2.12-3.4.0.jar with timestamp 1685494202182
23/05/31 00:50:02 WARN SparkContext: The JAR local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.4.0.jar at file:/opt/bitnami/spark/examples/jars/spark-examples_2.12-3.4.0.jar has been added already. Overwriting of added jar is not supported in the current version.
23/05/31 00:50:02 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
23/05/31 00:50:03 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master spark-master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:108)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: Failed to connect to spark-master/<unresolved>:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:284)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:214)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:226)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
... 4 more
Caused by: java.net.UnknownHostException: spark-master
at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:801)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1385)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1306)
at java.base/java.net.InetAddress.getByName(InetAddress.java:1256)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
# ...
23/05/31 00:51:02 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
23/05/31 00:51:02 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
23/05/31 00:51:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
23/05/31 00:51:02 INFO NettyBlockTransferService: Server created on spark-pi-ec6d2e886f483472-driver-svc.hm-spark.svc:7079
23/05/31 00:51:02 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/05/31 00:51:02 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-pi-ec6d2e886f483472-driver-svc.hm-spark.svc, 7079, None)
23/05/31 00:51:02 INFO BlockManagerMasterEndpoint: Registering block manager spark-pi-ec6d2e886f483472-driver-svc.hm-spark.svc:7079 with 413.9 MiB RAM, BlockManagerId(driver, spark-pi-ec6d2e886f483472-driver-svc.hm-spark.svc, 7079, None)
23/05/31 00:51:02 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-pi-ec6d2e886f483472-driver-svc.hm-spark.svc, 7079, None)
23/05/31 00:51:02 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-pi-ec6d2e886f483472-driver-svc.hm-spark.svc, 7079, None)
23/05/31 00:51:03 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
23/05/31 00:51:03 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
23/05/31 00:51:03 INFO SparkContext: SparkContext is stopping with exitCode 0.
23/05/31 00:51:03 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
23/05/31 00:51:03 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
23/05/31 00:51:03 INFO DAGScheduler: Parents of final stage: List()
23/05/31 00:51:03 INFO DAGScheduler: Missing parents: List()
23/05/31 00:51:03 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
23/05/31 00:51:03 INFO SparkUI: Stopped Spark web UI at http://spark-pi-ec6d2e886f483472-driver-svc.hm-spark.svc:4040
23/05/31 00:51:03 INFO TaskSchedulerImpl: Cancelling stage 0
23/05/31 00:51:03 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
23/05/31 00:51:03 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) failed in 0.028 s due to Job aborted due to stage failure: Task serialization failed: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1020)
org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
org.apache.spark.examples.SparkPi.main(SparkPi.scala)
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.base/java.lang.reflect.Method.invoke(Method.java:568)
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
And here is my Kubernetes services:
➜ kubectl get services --namespace hm-spark
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
spark-headless ClusterIP None <none> <none> 82m
spark-master-svc ClusterIP 10.43.164.158 <none> 7077/TCP,80/TCP 82m
spark-pi-ec6d2e886f483472-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 10m
Any idea? Thanks!
I have no specific experience with these Bitnami helm charts, but it seems to me like your application is both trying to use a:
--master
config, starting with k8s://
)spark://
part of the master URL)That seems like a bit of a mix up: you should choose between one of both. After having a look at some docs around those Bitnami helm charts, I found this example:
$ ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.container.image=bitnami/spark:3 \
--master k8s://https://k8s-apiserver-host:k8s-apiserver-port \
--conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://spark-master-svc:spark-master-port \
--deploy-mode cluster \
./examples/jars/spark-examples_2.12-3.2.0.jar 1000
Again, I'm not entirely familiar with these helm charts but it seems like you might be missing a critical configuration bit concerning which master will finally be used, namely:
--conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://spark-master-svc:spark-master-port