Search code examples
kubernetesgoogle-cloud-filestore

Clients in different regions can't connect to Filestore


I have two GKE clusters in the same Google Cloud project, but using the same PV/PVC YAMLs one cluster can successfully mount the Filestore instance and the other cluster fails. The failed GKE cluster events look like:

Event  : Pod [podname] Unable to attach or mount volumes: unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition   FailedMount
Event  : Pod [podname] MountVolume.SetUp failed for volume "nfs-pv" : mount failed: exit status 1

The Kublet logs for the failed mount:

pod_workers.go:191] Error syncing pod [guid] ("[podname](guid)"), skipping: unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition
kubelet.go:1622] Unable to attach or mount volumes for pod "podname(guid)": unmounted volumes=[nfs-pv], unattached volumes=[nfs-pv]: timed out waiting for the condition; skipping pod"
mount_linux.go:150] Mount failed: exit status 1
Output: Running scope as unit: run-r1fb543aa9a9246e0be396dd93bb424f6.scope
Mount failed: mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv --scope -- /home/kubernetes/containerized_mounter/mounter mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv
Output: mount.nfs: Connection timed out
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv]
nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/nfs/c61546e6-9769-4e16-bd0b-c73f904272aa-nfs-pv podName:c61546e6-9769-4e16-bd0b-c73f904272aa nodeName:}" failed. No retries permitted until 2021-09-11 10:01:44.725959505 +0000 UTC m=+820955.435941160 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"nfs-pv\" (UniqueName: \"kubernetes.io/nfs/c61546e6-9769-4e16-bd0b-c73f904272aa-nfs-pv\") pod \"podname\" (UID: \"c61546e6-9769-4e16-bd0b-c73f904272aa\") : mount failed: exit status 1\nMounting command: systemd-run\nMounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv --scope -- /home/kubernetes/containerized_mounter/mounter mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv\nOutput: Running scope as unit: run-r1fb543aa9a9246e0be396dd93bb424f6.scope\nMount failed: mount failed: exit status 32\nMounting command: chroot\nMounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs 192.168.99.2:/mount /var/lib/kubelet/pods/c61546e6-9769-4e16-bd0b-c73f904272aa/volumes/kubernetes.io~nfs/nfs-pv]\nOutput: mount.nfs: Connection timed out\n"

Perhaps one clue is the two clusters are in separate regions and separate subnets? Why can Filestore connect to one cluster but not other?


Solution

  • After many, many hours of debugging I was able to find an answer. Based on: https://groups.google.com/g/google-cloud-filestore-discuss/c/wKTT6hEzk08

    The failing GKE cluster was in a subnet outside of RFC1918, so therefore was not accepted as a Filestore client. Once we changed the subnet to be a valid RFC1918, both GKE clusters were able to successfully mount the Filestore instance.

    This was extremely frustrating, given the RFC1918 requirement is not clear in the documentation or troubleshooting - and in fact other Google Cloud services worked fine with the invalid RFC1918 subnet.