Search code examples
amazon-web-serviceskubernetesvaultcsi-driver

Mount secrets volumes with CSI driver and vault provider in kubernetes when the pod has a toleration and affinity


I have a kubernetes cluster with two node groups in AWS. One for Spot instances and the other for on demand instances. I have installed Vault and CSI driver to manage the secrets.

When I create this deployment everything works fine, the pods are created, run and the secrets are there.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: vault-test
  name: vault-test
  namespace: development
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vault-test
  strategy: {}
  template:
    metadata:
      labels:
        app: vault-test
    spec:
      containers:
      - image: jweissig/app:0.0.1
        name: app
        envFrom:
        - secretRef:
            name: dev-secrets
        resources: {}
        volumeMounts:
        - name: secrets-store-inline
          mountPath: "/mnt/secrets"
          readOnly: true
      serviceAccountName: vault-sa
      volumes:
        - name: secrets-store-inline
          csi:
            driver: secrets-store.csi.k8s.io
            readOnly: true
            volumeAttributes:
              secretProviderClass: dev
status: {}

But when I add nodeAffinity and tolerations to create the pods in the Spot machines the pods stay in a ContainerCreating status with the following error:

Warning FailedMount 10m (x716 over 24h) kubelet MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod development/pod-name, err: error connecting to provider "vault": provider not found: provider "vault"

I created two applications to test the vault behavior, one with no tolerations just for testing and the real one, with the tolerations and nodeAffinity. And after a lot of tests I realized the problem was where the pods are being scheduled, but I don't understand why that behavior


Solution

  • The problem is the vault CSI driver configuration, the DaemonSet is not running in all nodes because of the missing tolerations. I had to add the tolerations to the DaemonSet manifest so there is a Pod in all nodes, and this way all nodes know what vault is.