Search code examples
kuberneteskubernetes-helm

K8S groundnuty/k8s-wait-for image failing to start as init container (with helm)


I'm facing a problem with the image groundnuty/k8s-wait-for. Project at github and repo at dockerhub.

I'm pretty sure that error is in command arguments, as the init container fails with Init:CrashLoopBackOff.

About image: This image is used for init containers, which need to postpone a pod deployment. The script that is in the image waits for the pod or job to complete, after it completes it lets the main container and all replicas start deploying.

In my example, it should wait for a job named {{ .Release.Name }}-os-server-migration-{{ .Release.Revision }} to finish, and after it detects it is finished it should let main containers start. Helm templates are used.

By my understanding, the job name is {{ .Release.Name }}-os-server-migration-{{ .Release.Revision }} and the second command argument at the init container in deployment.yml needs to be the same so the init container can depend on the named job. Any other opinions or experiences with this approach?

There are templates attached.

DEPLOYMENT.YML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-os-{{ .Release.Revision }}
  namespace: {{ .Values.namespace }}
  labels:
    app: {{ .Values.fullname }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Values.fullname }}
  template:
    metadata:
      labels:
        app: {{ .Values.fullname }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: 8080
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
      initContainers:
        - name: "{{ .Chart.Name }}-init"
          image: "groundnuty/k8s-wait-for:v1.3"
          imagePullPolicy: "{{ .Values.init.pullPolicy }}"
          args:
            - "job"
            - "{{ .Release.Name }}-os-server-migration-{{ .Release.Revision }}"

JOB.YML:

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ .Release.Name }}-os-server-migration-{{ .Release.Revision }}
  namespace: {{ .Values.migration.namespace }}
spec:
  backoffLimit: {{ .Values.migration.backoffLimit }}
  template:
    spec:
      {{- with .Values.migration.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      containers:
        - name: {{ .Values.migration.fullname }}
          image: "{{ .Values.migration.image.repository }}:{{ .Values.migration.image.tag }}"
          imagePullPolicy: {{ .Values.migration.image.pullPolicy }}
          command:
            - sh
            - /app/migration-entrypoint.sh
      restartPolicy: {{ .Values.migration.restartPolicy }}

LOGS:

  Normal   Scheduled  46s                default-scheduler  Successfully assigned development/octopus-dev-release-os-1-68cb9549c8-7jggh to minikube
  Normal   Pulled     41s                kubelet            Successfully pulled image "groundnuty/k8s-wait-for:v1.3" in 4.277517553s
  Normal   Pulled     36s                kubelet            Successfully pulled image "groundnuty/k8s-wait-for:v1.3" in 3.083126925s
  Normal   Pulling    20s (x3 over 45s)  kubelet            Pulling image "groundnuty/k8s-wait-for:v1.3"
  Normal   Created    18s (x3 over 41s)  kubelet            Created container os-init
  Normal   Started    18s (x3 over 40s)  kubelet            Started container os-init
  Normal   Pulled     18s                kubelet            Successfully pulled image "groundnuty/k8s-wait-for:v1.3" in 1.827195139s
  Warning  BackOff    4s (x4 over 33s)   kubelet            Back-off restarting failed container

kubectl get all -n development

NAME                                                        READY   STATUS                  RESTARTS   AGE
pod/octopus-dev-release-os-1-68cb9549c8-7jggh   0/1     Init:CrashLoopBackOff   2          44s
pod/octopus-dev-release-os-1-68cb9549c8-9qbdv   0/1     Init:CrashLoopBackOff   2          44s
pod/octopus-dev-release-os-1-68cb9549c8-c8h5k   0/1     Init:Error              2          44s
pod/octopus-dev-release-os-migration-1-9wq76    0/1     Completed               0          44s
......
......
NAME                                                       COMPLETIONS   DURATION   AGE
job.batch/octopus-dev-release-os-migration-1   1/1           26s        44s


Solution

  • For anyone facing the same issue, I will explain my fix.

    Problem was that the containers inside deployment.yaml had no permissions to use Kube API. So, groundnuty/k8s-wait-for:v1.3 container could not check has the job {{ .Release.Name }}-os-server-migration-{{ .Release.Revision }} completed or not. That's why init containers instantly failed with CrashLoopError.

    After adding Service Account, role, and role binding everything worked great, and groundnuty/k8s-wait-for:v1.3 successfully waited for the job(migration) to finish, in order to let the main container run.

    Here are the examples of the code for the service account, role, and role binding that solved the issue.

    sa.yaml

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: sa-migration
      namespace: development
    

    role.yaml

    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: migration-reader
    rules:
      - apiGroups: ["batch","extensions"]
        resources: ["jobs"]
        verbs: ["get","watch","list"]
    

    role-binding.yaml

    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: migration-reader
    subjects:
    - kind: ServiceAccount
      name: sa-migration
    roleRef:
      kind: Role
      name: migration-reader
      apiGroup: rbac.authorization.k8s.io