I have this docker file based on alpine
that installs several packages with conda
. At the end installs pydrill
with pip
as there's no conda
installation.
from jcrist/alpine-dask
RUN /opt/conda/bin/conda update -n base -c defaults conda -y
RUN /opt/conda/bin/conda update dask
RUN /opt/conda/bin/conda install -c conda-forge dask-ml
RUN /opt/conda/bin/conda install scikit-learn -y
RUN /opt/conda/bin/conda install flask -y
RUN /opt/conda/bin/conda install waitress -y
RUN /opt/conda/bin/conda install gunicorn -y
RUN /opt/conda/bin/conda install pytest -y
RUN /opt/conda/bin/conda install apscheduler -y
RUN /opt/conda/bin/conda install matplotlib -y
RUN /opt/conda/bin/conda install pyodbc -y
USER root
RUN apk update
RUN apk add py-pip
RUN pip install pydrill
When I build the docker image everything works fine. But when I run the container the command line starts gunicorn
, but it fails with the following message:
File "/code/app/service/cm/exec/run_drill.py", line 1, in <module>
from pydrill.client import PyDrill
ModuleNotFoundError: No module named 'pydrill'
Is this pip
installation correct? This is the docker compose:
version: "3.0"
services:
web:
image: img-dask
volumes:
- vol_py_code:/code
- vol_dask_data:/data
- vol_dask_model:/model
ports:
- "5000:5000"
working_dir: /code
environment:
- app.config=/code/conf/py.app.json
- common.config=/code/conf/py.common.json
entrypoint:
- /opt/conda/bin/gunicorn
command:
- -b 0.0.0.0:5000
- --reload
- app.frontend.app:app
scheduler:
image: img-dask
ports:
- "8787:8787"
- "8786:8786"
entrypoint:
- /opt/conda/bin/dask-scheduler
worker:
image: img-dask
depends_on:
- scheduler
environment:
- PYTHONPATH=/code
- MODEL_PATH=/model/rfc_model.pkl
- PREPROCESSING_PATH=/model/data_columns.pkl
- SCHEDULER_ADDRESS=scheduler
- SCHEDULER_PORT=8786
volumes:
- vol_py_code:/code
- vol_dask_data:/data
- vol_dask_model:/model
entrypoint:
- /opt/conda/bin/dask-worker
command:
- scheduler:8786
volumes:
vol_py_code:
name: vol_py_code
vol_dask_data:
name: vol_dask_data
vol_dask_model:
name: vol_dask_model
UPDATE
If I run the command line inside the container, I can see that pydrill is installed, but my code does not see the library.
/code/conf # pip3 list
Package Version
---------- ---------
certifi 2020.12.5
chardet 4.0.0
idna 2.10
pip 18.1
pydrill 0.3.4
requests 2.25.1
setuptools 40.6.2
urllib3 1.26.4
You are using pip version 18.1, however version 21.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The problem is that pydrill and all other conda packages are in different environments. When the server starts, it doesn't see pydrill, only conda packages.
To fix the issue install pip itself in conda's environment:
from jcrist/alpine-dask
USER root
RUN /opt/conda/bin/conda create -p /pyenv -y
RUN /opt/conda/bin/conda install -p /pyenv dask scikit-learn flask waitress gunicorn \
pytest apscheduler matplotlib pyodbc -y
RUN /opt/conda/bin/conda install -p /pyenv -c conda-forge dask-ml -y
RUN /opt/conda/bin/conda install -p /pyenv pip -y
RUN /pyenv/bin/pip install pydrill