Search code examples
apache-sparkkubernetes

Spark 3.1.2: Kubernetes Client Closed Warning Leading to Executor Task Hanging – How to Fix or Work Around?


I’m using Spark version 3.1.2 in Kubernetes. Occasionally, I encounter an issue where logs show a warning "WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed," followed by the application hanging. One executor task keeps spinning indefinitely, and this seems similar to SPARK-33349.

Later Spark releases mention an upgrade to the Kubernetes client, but it’s unclear whether this fixes my issue. Upgrading Spark is a complex process, and I’d like to know:

  1. Which version specifically fixes this issue?
  2. Is there any workaround with settings to solve this issue without upgrading?

Solution

  • I solved the hanging issue by applying persist(MEMORY_AND_DISK) to the DataFrames retrieved from the database. It seems that the data was not being cached or was getting lost in memory, which resulted in additional queries to the database—one of which would hang. It’s unclear whether the hanging is related to the Kubernetes client eventually failing with the error 'too old resource version'. I hope this information is helpful to someone.