Search code examples
kubernetesgoogle-kubernetes-enginekubernetes-pvckubernetes-deployment

Updating a deployment that uses a ReadWriteOnce volume will fail on mount


My deployment is using a couple of volumes, all defined as ReadWriteOnce.

When applying the deployment to a clean cluster, pod is created successfuly.

However, if I update my deployment (i.e update container image), when a new pod is created for my deployment it will always fail on volume mount:

/Mugen$ kubectl get pods
NAME                            READY     STATUS              RESTARTS   AGE
my-app-556c8d646b-4s2kg         5/5       Running             1          2d
my-app-6dbbd99cc4-h442r         0/5       ContainerCreating   0          39m

/Mugen$ kubectl describe pod my-app-6dbbd99cc4-h442r
      Type     Reason                  Age                 From                                             Message
      ----     ------                  ----                ----                                             -------
      Normal   Scheduled               9m                  default-scheduler                                Successfully assigned my-app-6dbbd99cc4-h442r to gke-my-test-default-pool-671c9db5-k71l
      Warning  FailedAttachVolume      9m                  attachdetach-controller                          Multi-Attach error for volume "pvc-b57e8a7f-1ca9-11e9-ae03-42010a8400a8" Volume is already used by pod(s) my-app-556c8d646b-4s2kg
      Normal   SuccessfulMountVolume   9m                  kubelet, gke-my-test-default-pool-671c9db5-k71l  MountVolume.SetUp succeeded for volume "default-token-ksrbf"
      Normal   SuccessfulAttachVolume  9m                  attachdetach-controller                          AttachVolume.Attach succeeded for volume "pvc-2cc1955a-1cb2-11e9-ae03-42010a8400a8"
      Normal   SuccessfulAttachVolume  9m                  attachdetach-controller                          AttachVolume.Attach succeeded for volume "pvc-2c8dae3e-1cb2-11e9-ae03-42010a8400a8"
      Normal   SuccessfulMountVolume   9m                  kubelet, gke-my-test-default-pool-671c9db5-k71l  MountVolume.SetUp succeeded for volume "pvc-2cc1955a-1cb2-11e9-ae03-42010a8400a8"
      Normal   SuccessfulMountVolume   9m                  kubelet, gke-my-test-default-pool-671c9db5-k71l  MountVolume.SetUp succeeded for volume "pvc-2c8dae3e-1cb2-11e9-ae03-42010a8400a8"
      Warning  FailedMount             52s (x4 over 7m)    kubelet, gke-my-test-default-pool-671c9db5-k71l  Unable to mount volumes for pod "my-app-6dbbd99cc4-h442r_default(affe75e0-1edd-11e9-bb45-42010a840094)": timeout expired waiting for volumes to attach or mount for pod "default"/"my-app-6dbbd99cc4-h442r". list of unmounted volumes=[...]. list of unattached volumes=[...]

What is the best strategy to apply changes to such a deployment then? Will there have to be some service outage in order to use the same persistence volumes? (I wouldn't want to create new volumes - the data should maintain)


Solution

  • I ended with a better solution, where all my client pods are only readers of the content, and I have an independent CI process writing the content, I do the following:

    • From CI: Write content to a Google Cloud Storage bucket: gs://my-storage, then restart all frontend pods
    • On deployment definition I sync (download) entire bucket to pod volatile storage and serve it from file system with the best performance.

    How to achieve that: On the frontend docker image, I added the gcloud installation block from https://github.com/GoogleCloudPlatform/cloud-sdk-docker/blob/master/debian_slim/Dockerfile :

    ARG CLOUD_SDK_VERSION=249.0.0
    ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
    ARG INSTALL_COMPONENTS
    ENV PATH "$PATH:/opt/google-cloud-sdk/bin/"
    RUN apt-get update -qqy && apt-get install -qqy \
            curl \
            gcc \
            python-dev \
            python-setuptools \
            apt-transport-https \
            lsb-release \
            openssh-client \
            git \
            gnupg \
        && easy_install -U pip && \
        pip install -U crcmod && \
        export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
        echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
        curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
        apt-get update && apt-get install -y google-cloud-sdk=${CLOUD_SDK_VERSION}-0 $INSTALL_COMPONENTS && \
        gcloud config set core/disable_usage_reporting true && \
        gcloud config set component_manager/disable_update_check true && \
        gcloud config set metrics/environment github_docker_image && \
        gcloud --version
    VOLUME ["/root/.config"]
    

    And in the pod deployment frontend.yaml I added the following lifecycle event:

    ...
    spec:
      ...
      containers:
      ...
        lifecycle:
        postStart:
          exec:
           command: ["gsutil", "-m", "rsync", "-r", "gs://my-storage", "/usr/share/nginx/html"]
    

    To "refresh" the frontend pods when bucket content is updated, I simply run the following from my CI:

    kubectl set env deployment/frontend K8S_FORCE=date +%s``