Search code examples
kubernetesetcdamazon-efskubernetes-pvc

kubernetes aws-efs-csi-driver and permissions


I'm using bitnami/etcd chart and it has ability to create snapshots via EFS mounted pvc.

However I get permission error after aws-efs-csi-driver is provisioned and PVC mounted to any non-root pod (user/gid is 1001)

I'm using helm chart https://kubernetes-sigs.github.io/aws-efs-csi-driver/ version 2.2.0

values of the chart:

# you can obtain the fileSystemId with
# aws efs describe-file-systems --query "FileSystems[*].FileSystemId"
storageClasses:
  - name: efs
    parameters:
      fileSystemId: fs-exxxxxxx
      directoryPerms: "777"
      gidRangeStart: "1000"
      gidRangeEnd: "2000"
      basePath: "/snapshots"

# enable it after the following issue is resolved
# https://github.com/bitnami/charts/issues/7769
# node:
#   nodeSelector:
#     etcd: "true"

I then manually created the PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: etcd-snapshotter-pv
  annotations:
    argocd.argoproj.io/sync-wave: "60"
spec:
  capacity:
    storage: 32Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: efs
  csi:
    driver: efs.csi.aws.com
    volumeHandle: fs-exxxxxxx

Then if I mount that EFS PVC in non-rood pod I get the following error

➜ klo etcd-snapshotter-001-ph8w9                          
etcd 23:18:38.76 DEBUG ==> Using endpoint etcd-snapshotter-001-ph8w9:2379
{"level":"warn","ts":1633994320.7789018,"logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0005ea380/#initially=[etcd-snapshotter-001-ph8w9:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.120.2.206:2379: connect: connection refused\""}
etcd-snapshotter-001-ph8w9:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
etcd 23:18:40.78 WARN  ==> etcd endpoint etcd-snapshotter-001-ph8w9:2379 not healthy. Trying a different endpoint
etcd 23:18:40.78 DEBUG ==> Using endpoint etcd-2.etcd-headless.etcd.svc.cluster.local:2379
etcd-2.etcd-headless.etcd.svc.cluster.local:2379 is healthy: successfully committed proposal: took = 1.6312ms
etcd 23:18:40.87 INFO  ==> Snapshotting the keyspace
Error: could not open /snapshots/db-2021-10-11_23-18.part (open /snapshots/db-2021-10-11_23-18.part: permission denied)

As a result I have to spawn a new "root" pod, get inside the pod and manually adjust the permissions

apiVersion: v1
kind: Pod
metadata:
  name: perm
spec:
  securityContext:
    runAsUser: 0
    runAsGroup: 0
    fsGroup: 0    
  containers:
  - name: app1
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "sleep 3000"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /snapshots
    securityContext:
      runAsUser: 0
      runAsGroup: 0
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: etcd-snapshotter
  nodeSelector:
    etcd: "true"
k apply -f setup.yaml
k exec -ti perm -- ash
cd /snapshots
/snapshots # chown -R 1001.1001 .
/snapshots # chmod -R 777 .
/snapshots # exit

➜ k create job --from=cronjob/etcd-snapshotter etcd-snapshotter-001
job.batch/etcd-snapshotter-001 created

➜ klo etcd-snapshotter-001-bmv79                          
etcd 23:31:10.22 DEBUG ==> Using endpoint etcd-1.etcd-headless.etcd.svc.cluster.local:2379
etcd-1.etcd-headless.etcd.svc.cluster.local:2379 is healthy: successfully committed proposal: took = 2.258532ms
etcd 23:31:10.32 INFO  ==> Snapshotting the keyspace
{"level":"info","ts":1633995070.4244702,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"/snapshots/db-2021-10-11_23-31.part"}
{"level":"info","ts":1633995070.4907935,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1633995070.4908395,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"etcd-1.etcd-headless.etcd.svc.cluster.local:2379"}
{"level":"info","ts":1633995070.4965465,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1633995070.544217,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"etcd-1.etcd-headless.etcd.svc.cluster.local:2379","size":"320 kB","took":"now"}
{"level":"info","ts":1633995070.5507936,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"/snapshots/db-2021-10-11_23-31"}
Snapshot saved at /snapshots/db-2021-10-11_23-31

➜ k exec -ti perm -- ls -la /snapshots                             
total 924
drwxrwxrwx    2 1001     1001          6144 Oct 11 23:31 .
drwxr-xr-x    1 root     root            46 Oct 11 23:25 ..
-rw-------    1 1001     root        319520 Oct 11 23:31 db-2021-10-11_23-31

Is there a way to automate this?

I have this setting in storage class

gidRangeStart: "1000"
gidRangeEnd: "2000"

but it has no effect.

PVC is defined as:

➜ kg pvc etcd-snapshotter -o yaml                
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
  name: etcd-snapshotter
  namespace: etcd
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 32Gi
  storageClassName: efs
  volumeMode: Filesystem
  volumeName: etcd-snapshotter-pv

Solution

  • By default the StorageClass field provisioningMode is unset, please set it to provisioningMode: "efs-ap" to enable dynamic provision with access point.