I have a Docker image which contains an analysis pipeline. To run this pipeline, I need to provide input data and I want to keep the outputs. This pipeline must be able to be run by other users than myself, on their own laptops.
Briefly, my root (/) folder structure is as follows:
total 72
drwxr-xr-x 1 root root 4096 May 29 15:38 bin
drwxr-xr-x 2 root root 4096 Feb 1 17:09 boot
drwxr-xr-x 5 root root 360 Jun 1 15:31 dev
drwxr-xr-x 1 root root 4096 Jun 1 15:31 etc
drwxr-xr-x 2 root root 4096 Feb 1 17:09 home
drwxr-xr-x 1 root root 4096 May 29 15:49 lib
drwxr-xr-x 2 root root 4096 Feb 24 00:00 lib64
drwxr-xr-x 2 root root 4096 Feb 24 00:00 media
drwxr-xr-x 2 root root 4096 Feb 24 00:00 mnt
drwxr-xr-x 1 root root 4096 Mar 12 19:38 opt
drwxr-xr-x 1 root root 4096 Jun 1 15:24 pipeline
dr-xr-xr-x 615 root root 0 Jun 1 15:31 proc
drwx------ 1 root root 4096 Mar 12 19:38 root
drwxr-xr-x 3 root root 4096 Feb 24 00:00 run
drwxr-xr-x 1 root root 4096 May 29 15:38 sbin
drwxr-xr-x 2 root root 4096 Feb 24 00:00 srv
dr-xr-xr-x 13 root root 0 Apr 29 10:14 sys
drwxrwxrwt 1 root root 4096 Jun 1 15:25 tmp
drwxr-xr-x 1 root root 4096 Feb 24 00:00 usr
drwxr-xr-x 1 root root 4096 Feb 24 00:00 var
The pipeline scripts are in /pipeline and are packaged into the image with a "COPY. /pipeline" instruction in my Dockerfile.
For various reasons, this pipeline (which is a legacy pipeline) is set up so that the input data must be in a folder such /pipeline/project. To run my pipeline, I use:
docker run --rm --mount type=bind,source=$(pwd),target=/pipeline/project --user "$(id -u):$(id -g)" pipelineimage:v1
In other words, I mount a folder with the data to /pipeline/project. I found I needed to use the --user to insure the output files would have the correct permissions - i.e. I would have read/write/exec access on my host computer after the container exits.
The pipeline runs but I have one issue: one particular software used by the pipeline automatically tries to produce (and I can't change that) 1 folder in $HOME (so / - which I showed above) and 1 folder in my WORKDIR (which I have set up in my Dockerfile to be /pipeline). These attempts fails, and I'm guessing it's because I am not running the pipeline as root. But I need to use --user to make sure my outputs have the correct permissions - i.e. that I don't require sudo rights to read these outputs etc.
My question is: how am I meant to handle this? It seems that by using --user, I have the correct permissions set for the mounted folder (/pipeline/projects) where many output files are successfully made, no problems there. But how can I ensure the other 2 folders are correctly made outside of that mount?
I have tried the following but not success:
Am I missing something? I don't understand why it's "easy" to handle permissions for a mounted folder but so much harder for the other folders in a container. Thanks.
If your software doesn't rely on relative paths (~/
, ./
), you can just set $HOME
and WORKDIR
to a directory that any user can write:
ENV HOME=/tmp
WORKDIR /tmp
If you can't do that, you can pass the uid/gid via the environment to an entrypoint script running as root, chown/chmod as necessary, then drop privileges to run the pipeline (runuser
, su
, sudo
, setuidgid
).
For example (untested):
entrypoint.sh
#!/bin/bash
[[ -v "RUN_UID" ]] || { echo "unset RUN_UID" >&2; exit 1; }
[[ -v "RUN_GID" ]] || { echo "unset RUN_GID" >&2; exit 1; }
# chown, chmod, set env, etc.
chown $RUN_UID:$RUN_GID "/path/that/requires/write/permissions"
export HOME=/tmp
# Run the pipeline as a non-root user.
sudo -E -u "#$RUN_UID" -g "#$RUN_GID" /path/to/pipeline
Dockerfile
...
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
Finally, pass the user and group IDs via the environment when running:
docker run --rm --mount type=bind,source=$(pwd),target=/pipeline/project -e RUN_UID=$(id -u) -e RUN_GID=$(id -g) pipelineimage:v1