Search code examples
pythondockerdocker-composepipalpine-linux

Install pydrill in Docker image


I have this docker file based on alpine that installs several packages with conda. At the end installs pydrill with pip as there's no conda installation.

from jcrist/alpine-dask

RUN /opt/conda/bin/conda update -n base -c defaults conda -y
RUN /opt/conda/bin/conda update dask
RUN /opt/conda/bin/conda install -c conda-forge dask-ml
RUN /opt/conda/bin/conda install scikit-learn -y
RUN /opt/conda/bin/conda install flask -y
RUN /opt/conda/bin/conda install waitress -y
RUN /opt/conda/bin/conda install gunicorn -y
RUN /opt/conda/bin/conda install pytest -y
RUN /opt/conda/bin/conda install apscheduler -y
RUN /opt/conda/bin/conda install matplotlib -y
RUN /opt/conda/bin/conda install pyodbc -y

USER root
RUN apk update
RUN apk add py-pip
RUN pip install pydrill

When I build the docker image everything works fine. But when I run the container the command line starts gunicorn, but it fails with the following message:

  File "/code/app/service/cm/exec/run_drill.py", line 1, in <module>
    from pydrill.client import PyDrill
   
   ModuleNotFoundError: No module named 'pydrill'

Is this pip installation correct? This is the docker compose:

version: "3.0"
services:

  web:
    image: img-dask
    volumes:
      - vol_py_code:/code
      - vol_dask_data:/data
      - vol_dask_model:/model
    ports:
      - "5000:5000"
    working_dir: /code
    environment:
      - app.config=/code/conf/py.app.json
      - common.config=/code/conf/py.common.json     
    entrypoint:
      - /opt/conda/bin/gunicorn
    command:
      - -b 0.0.0.0:5000
      - --reload
      - app.frontend.app:app


 scheduler:
    image: img-dask
    ports:
      - "8787:8787"
      - "8786:8786"
    entrypoint:
      - /opt/conda/bin/dask-scheduler

  worker:
    image: img-dask
    depends_on:
      - scheduler
    environment:
      - PYTHONPATH=/code
      - MODEL_PATH=/model/rfc_model.pkl
      - PREPROCESSING_PATH=/model/data_columns.pkl
      - SCHEDULER_ADDRESS=scheduler
      - SCHEDULER_PORT=8786
    volumes:
      - vol_py_code:/code
      - vol_dask_data:/data
      - vol_dask_model:/model
    entrypoint:
      - /opt/conda/bin/dask-worker
    command:
      - scheduler:8786
      
volumes:
  vol_py_code:
     name: vol_py_code
  vol_dask_data:
     name: vol_dask_data
  vol_dask_model:
     name: vol_dask_model
  

UPDATE

If I run the command line inside the container, I can see that pydrill is installed, but my code does not see the library.

/code/conf # pip3 list
Package    Version  
---------- ---------
certifi    2020.12.5
chardet    4.0.0    
idna       2.10     
pip        18.1     
pydrill    0.3.4    
requests   2.25.1   
setuptools 40.6.2   
urllib3    1.26.4   
You are using pip version 18.1, however version 21.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

Solution

  • The problem is that pydrill and all other conda packages are in different environments. When the server starts, it doesn't see pydrill, only conda packages.

    To fix the issue install pip itself in conda's environment:

    from jcrist/alpine-dask
    
    USER root
    RUN /opt/conda/bin/conda create -p /pyenv -y
    RUN /opt/conda/bin/conda install -p /pyenv dask scikit-learn flask waitress gunicorn \
        pytest apscheduler matplotlib pyodbc -y
    RUN /opt/conda/bin/conda install -p /pyenv -c conda-forge dask-ml -y
    RUN /opt/conda/bin/conda install -p /pyenv pip -y
    RUN /pyenv/bin/pip install pydrill