Search code examples
kubernetesgoogle-kubernetes-enginepersistent-volumesgce-persistent-disk

GKE Volume Attach/mount error for regional persistent disk


I am struggling with a volumeattach error. I have a regional persistent disk which is in the same GCP project as my regional GKE cluster. My regional cluster is in europe-west2 with nodes in europe-west2-a, b and c. the regional disk is replicated across zones europe-west2-b and c.

I have a nfs-server deployment manifest which refers to the gcePersistantDisk.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations: []
  labels:
    app.kubernetes.io/managed-by: Helm
  name: nfs-server
  namespace: namespace
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  selector:
    matchLabels:
      role: nfs-server
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      serviceAccountName: nfs-server 
      containers:
      - image: gcr.io/google_containers/volume-nfs:0.8
        imagePullPolicy: IfNotPresent
        name: nfs-server
        ports:
        - containerPort: 2049
          name: nfs
          protocol: TCP
        - containerPort: 20048
          name: mountd
          protocol: TCP
        - containerPort: 111
          name: rpcbind
          protocol: TCP
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /data
          name: nfs-pvc
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - gcePersistentDisk:
          fsType: ext4
          pdName: my-regional-disk-name
        name: nfs-pvc
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution :
            nodeSelectorTerms: 
              - matchExpressions:
                - key: topology.gke.io/zone
                  operator: In
                  values:
                      - europe-west2-b
                      - europe-west2-c

and my pv/pvc

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 200Gi
  nfs:
    path: /
    server: nfs-server.namespace.svc.cluster.local
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
  name: nfs-pvc
  namespace: namespace
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 8Gi
  storageClassName: ""
  volumeMode: Filesystem
  volumeName: nfs-pv

When I apply my deployment manifest above I get the following error:

'rpc error: code = Unavailable desc = ControllerPublish not permitted on node "projects/ap-mc-qa-xxx-xxxx/zones/europe-west2-a/instances/node-instance-id" due to backoff condition'

The volume attachment tells me this:

Attach Error: Message:  rpc error: code = NotFound desc = ControllerPublishVolume could not find volume with ID projects/UNSPECIFIED/zones/UNSPECIFIED/disks/my-regional-disk-name: googleapi: Error 0: , notFound

These manifests seemed to work fine when it was deployed for a zonal cluster/disk. I've checked things like making sure the cluster svc acct has the necessary permissions. Disk is currently not in use.

What am I missing???


Solution

  • So the reason that the above won't work is because a regional persistant disk feature allows the creation of persistent disks that are available in 2 zones within the same region. In order to use that feature, the volume must be provisioned as a PersistentVolume; referencing the volume directly from a pod is not supported. Something like this:

    apiVersion: v1
    kind: PersistentVolume
    metadata:
     name: nfs-pv
    spec:
     capacity:
       storage: 200Gi
     accessModes:
     - ReadWriteMany
     gcePersistentDisk:
       pdName: my-regional-disk
       fsType: ext4
    

    Now trying to figure out how to re-configure the NFS sever to use a regional disk.