Search code examples
kubernetesredisgoogle-kubernetes-enginepersistent-volumescsi-driver

GKE: Failed to create snapshot content with error cannot find CSI PersistentVolumeSource


I am setting up a backup solution in GKE for my Redis cluster and I get this error when I try to use VolumeSnapshot object in K8s. I have enabled the the CSI driver addon in the cluster and I deployed the Redis cluster with Bitnami chart and also deployed the following recourses:

VolumeSnapshotClass.yaml:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: pd-snapshot-class
driver: pd.csi.storage.gke.io
deletionPolicy: Retain
parameters: 

VolumeSnapshot.yaml:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: new-snapshot-test3
spec:
  volumeSnapshotClassName: pd-snapshot-class
  source:
    persistentVolumeClaimName: redis-data-redis-cluster-0

And the error I get is this:

Failed to create snapshot content with error cannot find CSI PersistentVolumeSource for volume pvc-0c760ee2-a999-4c18-b103-fd2ae0922ecf

I am not sure why it does not work since I am following the documentation from Google Maybe it is something related to the way Redis cluster is deployed.

Edit: Check the comment on the accepted answer.


Solution

  • This is not a normal backup, If you tried to change anything it might break things and the change might get reverted. Unfortunately there is no way to use PVC based on standard storage class. You cannot migrate existing volumes to a new CSI easily.

    Try below possible workarounds to resolve your issue :

    Workaround 1 :

    Looks like you are trying snapshots with an in-tree persistent disk driver. It does not support snapshots. Snapshots are only supported for CSI drivers.

    You can find all CSI drivers that support snapshots, see the "Other features" column in Drivers in the Kubernetes document. In the documentation it is mentioned that,

    Users who want to use these CSI drivers need to contact driver maintainers for driver capabilities.

    Workaround 2 :

    Maybe your issue was that your PV was created manually and it was not provisioned with a CSI driver as a “gcePersistentDisk”, and you can leverage the dynamic provisioning feature of GKE to provision PV by defining PVCs. Refer to Dynamically provision PersistentVolumes for more information.

    So you will need to recreate the PVCs and use the corresponding storage class (with the CSI provisioner).

    Theoretically you can set the PV reclaim policy to retain it. Delete the PV. and then recreate it with the new storage class and reset the reclaim policy as needed, however if you are using dynamic provisioning, this makes pod management more manual in terms of storage. You would need to ensure that each pod has the correct mapping. This change is also disruptive.

    You can reuse your PVs as such:

    1. Track the name of GCE PD which used by k8s PVC
    2. Take GCE PD snapshot
    3. Create new volume from snapshot
    4. Use existing (from step-3) volume as PVC on k8s pod (Using a PersistentVolumeClaim bound to the PersistentVolume)

    It's important that you perform this in a testing environment, and to verify that everything will run smoothly before you act on the production environment.

    To migrate the data between PVC's, they can eg. manually migrate data by having both volumes in the same pod and then if they have a terminal session in the container, they could copy the data. Alternatively use the approach from Jose Pacheco’s Medium article on Migrate Kubernetes Volume Data for more information.

    Workaround 3:

    Copy data should not cause application interruption : Alternative approach is to just keep the GCE disk and create a new PVC referencing the disk, as documented in Using pre-existing persistent disks as PersistentVolumes.

    Alternatively, I found an open source tool on Github Interval-based Volume Snapshots and Expiry on Kubernetes that claims to perform snapshots. Please note, I have not personally used this tool myself and cannot vouch for its effectiveness or offer support for it. My advice would be to test (in a non production environment) using the beta feature to enable the CSI driver, and then work with the external-snapshotter tool.