Search code examples
kubernetespersistent-volumesjupyterhubkubernetes-pvc

Persisting user data and sharing folders using kubespawner


I am using kubespawner to spawn single-user notebook servers on a Kubernetes cluster.

I have two goals:

  1. user-specific data - which lies under /home/iamuser/ on the user spawned pod. Persist and ensure privacy. i.e. other spawned pods (users) should only be able to see their own data under /home/iamuser/ and this data should be persisted across redeployments etc.
  2. Shared folders - every pod (main and spawned user pods) will have a folder /home/iamuser/shared as well as /jupyter-shared/ . These would have to be persisted and shared across user pods.

My current approach:

volumeMount:
      - mountPath: /home/iamuser/shared 
        name: shared-pvc
      - mountPath: /jupyter-shared
        name: shared-pvc 
pvc:
  shared-pvc:
    name: shared-pvc
    accessModes: ["ReadWriteMany"]
    resources:
      requests:
        storage: 16Gi 

And for the kubespawner configuration within my jupyter_config.py:

#shared pvc for mounting folders we would like to share across pod spawns
shared_pvc_name = f"{app_name.lower().replace('_', '-')}-shared-pvc"
volumes_list = [{"name": shared_pvc_name, "persistentVolumeClaim": {"claimName": shared_pvc_name}}]  
volume_mounts_list = [{"name": shared_pvc_name, "mountPath": "/jupyter-shared"},{"name": shared_pvc_name, "mountPath": "/home/iamuser/shared"}]
 
persist_user_data = os.environ.get("JUPYTERHUB_PERSIST_USER_DATA", False)   
pvc = f"{app_name.lower().replace('_', '-')}-users-pvc"
if persist_user_data:
    volumes_list.append({"name": pvc, "persistentVolumeClaim": {"claimName": pvc}})
    volume_mounts_list.append({"name": pvc, "mountPath": "/home/iamuser", "subPath":"{username}")  
​
c.KubeSpawner.volumes = volumes_list
c.KubeSpawner.volume_mounts = volume_mounts_list

I have managed to make this work but I am not entirely sure why it works and I am concerned that the pvc for user-specific data is copying into its pvc the contents of shared. Could this cause a loop?

Furthermore, I realised that if I do not add {"name": shared_pvc_name, "mountPath": "/home/iamuser/shared"} on my jupyter config then my spawn pods will not even see the /home/iamuser/shared , only the main pod will have it. Initially I thought I could have this definition on the kubernetes spec only, but it doesnt seem so?

Note: the pvcs defined in helm definition get prefixed with appname on deployment, hence the prefix strings on jupyter config


Solution

  • So I believe due to how linux mounting system works, there is no cyclic copying here.

    In addition I should add subPath to both shared folders so that a file copied to one of them doesnt show up in the other directory too.

    I believe my confusion was coming from what subPath really does, its not a sub directory on the container but rather a sub path on the PVC.