I have set up my Dockerfile, it looks like this:
FROM python:3.6
ARG label
ARG seeds
ARG dataset_name=${label}_terms
RUN mkdir /prodigy
WORKDIR /prodigy
COPY ./prodigy-1.8.1-cp35.cp36.cp37-cp35m.cp36m.cp37m-linux_x86_64.whl /prodigy
RUN pip install prodigy-1.8.1-cp35.cp36.cp37-cp35m.cp36m.cp37m-linux_x86_64.whl
RUN pip install -U spacy
RUN python -m spacy download en_core_web_lg
EXPOSE 8080
RUN mkdir /work
ENV PRODIGY_HOME /work
WORKDIR /work
COPY ./prodigy.json /work
RUN prodigy dataset ${dataset_name}
ENV LABEL=${label}
ENV SEEDS=${seeds}
CMD prodigy terms.teach ${LABEL}_terms en_core_web_lg --seed "$SEEDS"
It works, but not as expected. It should run CMD command just once. Instead it does it 3 different ways (ps aux output):
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 4280 692 ? Ss 08:47 0:00 /bin/sh -c prodigy terms.teach ${LABEL}_terms en_core_web_lg --seed "$SEEDS"
root 8 0.0 0.0 4280 740 ? S 08:47 0:00 /bin/sh /usr/local/bin/prodigy terms.teach TRANSFER_terms en_core_web_lg --seed transfer, relocation, relegation
root 9 46.1 13.7 2329976 1687016 ? Sl 08:47 15:13 python -m prodigy terms.teach TRANSFER_terms en_core_web_lg --seed transfer, relocation, relegation
I wonder what is this a standard behavior? How can i make my Dockerfile clean?
(Without looking into the details of the command you're running) I suspect that the prodigy
command itself spawns a new shell / subcommands.
From the list of processes, PID 1
is the process that's run by Docker as the container's main process; the other processes are child-processes of the first one, and started by the main process.