Search code examples
bashgoogle-cloud-storagejupyterhubgcsfuse

gcsfuse command fail with gcsfuse takes exactly two arguments


I am using GCSFuse for mounting the GCS bucket to my user pod in JupyterHub, but it always fail with the error message gcsfuse takes exactly two arguments.

Here is my DockerFile:

FROM jupyter/minimal-notebook:177037d09156

ENV GCSFUSE_REPO gcsfuse-stretch
ENV GOOGLE_APPLICATIONS_CREDENTIALS=test-serviceaccount.json
ENV GCS_BUCKET: "my-bucket"
ENV GCS_BUCKET_FOLDER: "shared-data"

USER root

# Add google repositories for gcsfuse and google cloud sdk
RUN apt-get update -y && apt-get install -y --no-install-recommends apt-transport-https ca-certificates curl gnupg
RUN echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | tee /etc/apt/sources.list.d/gcsfuse.list
RUN echo "deb https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

# Install gcsfuse and google cloud sdk
RUN apt-get update -y  && apt-get install -y gcsfuse google-cloud-sdk \
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Switch back to notebook user (defined in the base image)
USER $NB_UID

# make directory for mounting
RUN mkdir -p home/shared-data \
    && mkdir -p etc/scripts

COPY start_mounting.sh etc/scripts

# install extra packages required for model training
RUN pip install --upgrade pip
RUN pip install fasttext
RUN pip install ax-platform

CMD ["bin/bash", "etc/scripts/start_mounting.sh"]

Script:

#!/bin/bash

# Setup GCSFuse
 gcsfuse --key-file ${GOOGLE_APPLICATIONS_CREDENTIALS} ${GCS_BUCKET} ${GCS_BUCKET_FOLDER}

my jupyterhub config.yaml

hub:
  baseUrl: /jupyterhub
  extraConfig: |
    from kubernetes import client
    def modify_pod_hook(spawner, pod):
        pod.spec.containers[0].security_context = client.V1SecurityContext(
            privileged=True,
            capabilities=client.V1Capabilities(
                add=['SYS_ADMIN']
            )
          )
        pod.spec.containers[0].env.append(
              client.V1EnvVar(
                  name='GOOGLE_APPLICATIONS_CREDENTIALS',
                  value_from=client.V1EnvVarSource(
                      secret_key_ref=client.V1SecretKeySelector(
                          name='jhub-secret',
                          key='jhub-serviceaccount',
                      )
                  )
              )
          )
        return pod
    c.KubeSpawner.modify_pod_hook = modify_pod_hook

singleuser:
  storage:
    type: none
  extraEnv:
  GCS_BUCKET: "my-bucket"
  GCS_BUCKET_FOLDER: "shared-data"
  lifecycleHooks:
    postStart:
      exec:
        command: ["/bin/sh", "etc/scripts/start_mounting.sh"]
    preStop:
      exec:
        command: ["fusermount", "-u", "shared-data"]
  image:
    name: gcr.io/project/base-images/jhub-k8s-cust-singleuser
    tag: 1.1.6
    pullPolicy: Always

I am overwriting the GOOGLE_APPLICATIONS_CREDENTIALS ENV for using it in --key-file argument in gcsfuse.

Could someone please tell me what is wrong here? Is something wrong with my pod PostStart Exec command? or my gcsfuse is wrong?


Solution

  • I solved it by creating the volume mounts for K8s secret (Google Service Account) and passing it as ENV in the script start_mounting.sh for the gcsfuse command.

    Below is the code that i used:

      storage:
          extraVolumes:
            - name: my-secret-jupyterhub
              secret:
                secretName: my-secret
          extraVolumeMounts:
            - name: my-secret-jupyterhub
              mountPath: /etc/secrets
              readOnly: true
        extraEnv:
          GOOGLE_APPLICATIONS_CREDENTIALS: /etc/secrets/key.json
    

    This seems to be rather more cleaner approach than getting the file contents of service account and again put it in file for the gcsfuse command as i was doing previously and discussed above.