I am using GCSFuse for mounting the GCS bucket to my user pod in JupyterHub, but it always fail with the error message gcsfuse takes exactly two arguments
.
Here is my DockerFile:
FROM jupyter/minimal-notebook:177037d09156
ENV GCSFUSE_REPO gcsfuse-stretch
ENV GOOGLE_APPLICATIONS_CREDENTIALS=test-serviceaccount.json
ENV GCS_BUCKET: "my-bucket"
ENV GCS_BUCKET_FOLDER: "shared-data"
USER root
# Add google repositories for gcsfuse and google cloud sdk
RUN apt-get update -y && apt-get install -y --no-install-recommends apt-transport-https ca-certificates curl gnupg
RUN echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | tee /etc/apt/sources.list.d/gcsfuse.list
RUN echo "deb https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install gcsfuse and google cloud sdk
RUN apt-get update -y && apt-get install -y gcsfuse google-cloud-sdk \
&& apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Switch back to notebook user (defined in the base image)
USER $NB_UID
# make directory for mounting
RUN mkdir -p home/shared-data \
&& mkdir -p etc/scripts
COPY start_mounting.sh etc/scripts
# install extra packages required for model training
RUN pip install --upgrade pip
RUN pip install fasttext
RUN pip install ax-platform
CMD ["bin/bash", "etc/scripts/start_mounting.sh"]
Script:
#!/bin/bash
# Setup GCSFuse
gcsfuse --key-file ${GOOGLE_APPLICATIONS_CREDENTIALS} ${GCS_BUCKET} ${GCS_BUCKET_FOLDER}
my jupyterhub config.yaml
hub:
baseUrl: /jupyterhub
extraConfig: |
from kubernetes import client
def modify_pod_hook(spawner, pod):
pod.spec.containers[0].security_context = client.V1SecurityContext(
privileged=True,
capabilities=client.V1Capabilities(
add=['SYS_ADMIN']
)
)
pod.spec.containers[0].env.append(
client.V1EnvVar(
name='GOOGLE_APPLICATIONS_CREDENTIALS',
value_from=client.V1EnvVarSource(
secret_key_ref=client.V1SecretKeySelector(
name='jhub-secret',
key='jhub-serviceaccount',
)
)
)
)
return pod
c.KubeSpawner.modify_pod_hook = modify_pod_hook
singleuser:
storage:
type: none
extraEnv:
GCS_BUCKET: "my-bucket"
GCS_BUCKET_FOLDER: "shared-data"
lifecycleHooks:
postStart:
exec:
command: ["/bin/sh", "etc/scripts/start_mounting.sh"]
preStop:
exec:
command: ["fusermount", "-u", "shared-data"]
image:
name: gcr.io/project/base-images/jhub-k8s-cust-singleuser
tag: 1.1.6
pullPolicy: Always
I am overwriting the GOOGLE_APPLICATIONS_CREDENTIALS ENV for using it in --key-file argument in gcsfuse.
Could someone please tell me what is wrong here? Is something wrong with my pod PostStart Exec command? or my gcsfuse is wrong?
I solved it by creating the volume mounts for K8s secret (Google Service Account) and passing it as ENV in the script start_mounting.sh
for the gcsfuse command.
Below is the code that i used:
storage:
extraVolumes:
- name: my-secret-jupyterhub
secret:
secretName: my-secret
extraVolumeMounts:
- name: my-secret-jupyterhub
mountPath: /etc/secrets
readOnly: true
extraEnv:
GOOGLE_APPLICATIONS_CREDENTIALS: /etc/secrets/key.json
This seems to be rather more cleaner approach than getting the file contents of service account and again put it in file for the gcsfuse command as i was doing previously and discussed above.