Search code examples
bashkubernetesgoogle-cloud-platformheap-dump

Will the sidecar container shutdown immediately when a Kubernetes pod is shutdown?


I have a sidecar container running along with a Java service container in a Kubernetes pod. The job of this sidecar container is to transfer heapdump to a GCP bucket when the Java service container restarts due to OOM crash. This is how the sidecar does it:

- name: heapdump-sidecar
  image: google/cloud-sdk:latest
  command:
    - /bin/sh
    - -c
    - |
      while true; do
        heap_dump_file="/appdata/java/heapdumps/heapdump.hprof";
        gs_bucket="gs://namespace-non-prod/heap-dump-{{ .Values.environment }}-bucket/";
        if [ -f "$heap_dump_file" ]; then
          echo "Heap dump file found. Uploading to GCS...";
          gsutil cp "$heap_dump_file" "$gs_bucket{{ .Chart.Name }}-$(date +%Y-%m-%d-%H-%M-%S).hprof";
          if [ $? -eq 0 ]; then
            echo "Heap dump uploaded successfully.";
            rm -rf /appdata/java/heapdumps/*;
          else
            echo "Heap dump upload failed";
          fi;
        else
          echo "Heap dump file not found. Waiting...";
        fi;
        sleep 5;
      done

There are two scenarios;

Scenario 1: The pod receives SIGTERM signal and starts the process of shutdown. The sidecar container is shutdown immediately and the pod waits for the main Java service container to shut down. it shutsdown in 10 seconds, and so the pod is killed in 10 seconds.

Scenario 2: The pod receives SIGTERM signal and starts the process of shutdown. The sidecar container is NOT shutdown immediately. The Java service container shuts down in 10 seconds but the pod has to wait for the sidecar container(which is not shutting down). The grace time runs out after 30 seconds and the pod along with container is killed, So the pod shutdown is 30 seconds.

What scenario would it actually be?


Solution

  • Your expectation is correct, containers may delay the shutdown of a pod when the pod has to shutdown. The sidecar container shuts down immediately when a Kubernetes pod is shut down.

    As per the official GCP blog by Sandeep Dinesh:

    “Kubernetes waits for a specified time called the termination grace period. By default, this is 30 seconds. It’s important to note that this happens in parallel to the preStop hook and the SIGTERM signal. Kubernetes does not wait for the preStop hook to finish. If your pod usually takes longer than 30 seconds to shut down, make sure you increase the grace period. You can do that by setting the terminationGracePeriodSeconds option in the Pod YAML.”

    As per the Medium blog written by Marko Lukša, “ If the sidecar container provides an executable file that waits until the sidecar is ready, you can invoke it in the container’s post-start hook to block the start of the remaining containers in the pod.”

    Note : As per this open github issue comment, this feature is not stable yet. Kubernetes is still taking feedback from the users.

    Edit1: Kubernetes will wait for all containers to shutdown for the Termination Grace Period including Sidecar Container. So if the main Java container shuts down in 10 seconds but the sidecar is still in the process of shutting down its processes,Kubernetes will wait until the termination grace period (30seconds) runs out for the sidecar container.Only after the grace period runs out, Kubernetes will forcefully terminate the pod. So Scenario 2 will occur for the situation.