Search code examples
kubernetesistioazure-akscanary-deployment

How to do a canary upgrade of existing istio customised setup?


How to do a canary upgrade to existing istio customised setup.

Requirements:

  • We have existing customised setup of istio 1.7.3 (installed using istoctl method and no revision set for this) for AKS 1.18.14.
  • Now we need to upgrade to istio 1.8 with no downtime or minimal.
  • The upgrade should be safer and it wont break our prod environemnt in any ways.

How we installed the current istio customised environment:

  1. created manifest.
istioctl manifest generate --set profile=default -f /manifests/overlay/overlay.yaml > $HOME/generated-manifest.yaml
  1. installed istio.
istioctl install --set profile=default -f /manifests/overlay/overlay.yaml
  1. Verified istio against the deployed manifest.
istioctl verify-install -f $HOME/generated-manifest.yaml

Planned upgrade process Reference

  1. Precheck for upgrade.
istioctl x precheck
  1. export the current used configuration of istio using below command to a yaml file.
kubectl -n istio-system get iop installed-state-install -o yaml > /tmp/iop.yaml
  1. Download istio 1.8 binary and extract the directory and navigate the directory to where we have the 1.8 version istioctl binary.
cd istio1.8\istioctl1.8
  1. from the new version istio directory, create a new controlplane for istio(1.8) with proper revision set and use the previously exported installed-state "iop.yaml".
./istioctl1.8 install --set revision=1-8 --set profile=default -f /tmp/iop.yaml

Expect that it will create new control plane with existing costamised configuration and now we will have two control plane deployments and services running side-by-side:

kubectl get pods -n istio-system -l app=istiod
NAME                                    READY   STATUS    RESTARTS   AGE
istiod-786779888b-p9s5n                 1/1     Running   0          114m
istiod-1-7-6956db645c-vwhsk             1/1     Running   0          1m 
  1. After this, we need to change the existing label of all our cluster namespaces where we need to inject the istio proxy containers. Need to remove the old "istio-injection" label, and add the istio.io/rev label to point to the canary revision 1-8.
kubectl label namespace test-ns istio-injection- istio.io/rev=1-8

Hope, at this point also the environment is stable with old istio configurations and we can make decision on which app pods can be restarted to make the new control plane changes as per our downtime, and its allowed to run some apps with older control plane and another with new controller plane configs t this point. eg:

kubectl rollout restart deployment -n test-ns  (first) 
kubectl rollout restart deployment -n test-ns2 (later)
kubectl rollout restart deployment -n test-ns3 (again after sometieme later)
  1. Once we planed for downtime and restarted the deployments as we decided, confirm all the pods are now using dataplane proxy injector of version 1.8 only
kubectl get pods -n test-ns -l istio.io/rev=1-8
  1. To verify that the new pods in the test-ns namespace are using the istiod-canary service corresponding to the canary revision
istioctl proxy-status | grep ${pod_name} | awk '{print $7}'
  1. After upgrading both the control plane and data plane, can uninstall the old control plane
istioctl x uninstall -f /tmp/iop.yaml.

Need to clear below points before upgrade.

  1. Are all the steps prepared for the upgrade above are good to proceed for highly used Prod environment?

  2. By exporting the installed state iop is enough to get all customised step to proceed the canary upgrade? or is there any chance of braking the upgrade or missing any settings?

  3. Whether the step 4 above will create the 1.8 istio control plane with all the customization as we already have without any break or missing something?

  4. after the step 4, do we need to any any extra configuration related to istiod service configuration> the followed document is not clear about that,

  5. for the step 5 above, how we can identy all the namespaces, where we have the istio-injection enabled and only modify those namespace alone and leave others as it was before?

  6. so for the step 8 above, how to ensure we are uninstalling old control plane only ? we have to get the binary for old controlplane say (1.7 in my case)and use that binary with same exported /tmp/iop.yaml?

  7. No Idea about how to rollback any issues happened in between.. before or after the old controlplane deleted


Solution

    1. No. You should go through changelog and upgrade notes. See what's new, what's changed, depracted etc. Adjust your configs accordingly.

    2. In theory - yes, in practice - no. See above. That's why you should always check upgarde notes/changelog and plan accordingly. There is always a slim chance something will go wrong.

    3. It should, but again, be prepared that something may break (One more time - go through changelog/upgrade notes, this is important).

    4. No.

    5. You can find all namespaces with Istio injection enabled with:

    kubectl get namespaces -l=istio-injection=enabled
    

    Istio upgrade process should only modify namespaces with injection enabled (and istio-system namespace).

    1. If your old control plane does not have a revision label, you have to uninstall it using its original installation options (old yaml file)
    istioctl x uninstall -f /path/to/old/config.yaml
    

    If it does have revision label:

    istioctl x uninstall --revision <revision>
    
    1. You can just uninstall new control plane with
    istioctl x uninstall revision=1-8
    

    This will revert to the old control plane, assuming you have not yet uninstalled it. However, you will have to reinstall gateways for the old version manually, as the uninstall command does not revert them automatically.


    I would strongly recommend creating a temporary test environment. Recreating existing cluster on test env. Performing upgrade there, and adjusting the process to meet your needs.
    This way you will avoid catastrofic failures on your production environment.