I am using kubespawner to spawn single-user notebook servers on a Kubernetes cluster.
I have two goals:
/home/iamuser/
on the user spawned pod. Persist and ensure privacy. i.e. other spawned pods (users) should only be able to see their own data under /home/iamuser/
and this data should be persisted across redeployments etc./home/iamuser/shared
as well as /jupyter-shared/
. These would have to be persisted and shared across user pods.My current approach:
volumeMount:
- mountPath: /home/iamuser/shared
name: shared-pvc
- mountPath: /jupyter-shared
name: shared-pvc
pvc:
shared-pvc:
name: shared-pvc
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 16Gi
And for the kubespawner configuration within my jupyter_config.py:
#shared pvc for mounting folders we would like to share across pod spawns
shared_pvc_name = f"{app_name.lower().replace('_', '-')}-shared-pvc"
volumes_list = [{"name": shared_pvc_name, "persistentVolumeClaim": {"claimName": shared_pvc_name}}]
volume_mounts_list = [{"name": shared_pvc_name, "mountPath": "/jupyter-shared"},{"name": shared_pvc_name, "mountPath": "/home/iamuser/shared"}]
persist_user_data = os.environ.get("JUPYTERHUB_PERSIST_USER_DATA", False)
pvc = f"{app_name.lower().replace('_', '-')}-users-pvc"
if persist_user_data:
volumes_list.append({"name": pvc, "persistentVolumeClaim": {"claimName": pvc}})
volume_mounts_list.append({"name": pvc, "mountPath": "/home/iamuser", "subPath":"{username}")
c.KubeSpawner.volumes = volumes_list
c.KubeSpawner.volume_mounts = volume_mounts_list
I have managed to make this work but I am not entirely sure why it works and I am concerned that the pvc for user-specific data is copying into its pvc the contents of shared. Could this cause a loop?
Furthermore, I realised that if I do not add {"name": shared_pvc_name, "mountPath": "/home/iamuser/shared"}
on my jupyter config then my spawn pods will not even see the /home/iamuser/shared
, only the main pod will have it. Initially I thought I could have this definition on the kubernetes spec only, but it doesnt seem so?
Note: the pvcs defined in helm definition get prefixed with appname on deployment, hence the prefix strings on jupyter config
So I believe due to how linux mounting system works, there is no cyclic copying here.
In addition I should add subPath to both shared folders so that a file copied to one of them doesnt show up in the other directory too.
I believe my confusion was coming from what subPath really does, its not a sub directory on the container but rather a sub path on the PVC.