Cannot launch gunicorn flask app with torch model on the docker

Does anyone has the working example of docker that uses GPU, torch, gunicorn and flask in the one application? Torch 1.4.0 throws an exception. Please find below the configuration

Dockerfile:

FROM nvidia/cuda:10.2-base-ubuntu18.04

# Install some basic utilities
RUN apt-get update && apt-get install -y \
    curl \
    ca-certificates \
    sudo \
    git \
    bzip2 \
    libx11-6 \
 && rm -rf /var/lib/apt/lists/*

# Create a working directory
RUN mkdir /app
WORKDIR /app

RUN apt-get update

RUN apt-get install -y curl python3.7 python3.7-dev python3.7-distutils

# Register the version in alternatives
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.7 1

# Set python 3 as the default python
RUN update-alternatives --set python /usr/bin/python3.7

# Upgrade pip to latest version
RUN curl -s https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
    python get-pip.py --force-reinstall && \
    rm get-pip.py

# Set the default command to python3

COPY requirements.txt .
RUN pip --no-cache-dir install -r requirements.txt
RUN pip install torch torchvision

WORKDIR /usr/src/app

COPY . ./

CMD python ./new_main.py --workers 1

And the new_main.py:

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--test", action='store_true')
    parser.add_argument("--workers", type=int, default=1)
    args = parser.parse_args()
    if check_test_mode(args.test):
        number_of_GPU_workers = args.workers or 1
    options = {
        'bind': '%s:%s' % ('0.0.0.0', str(port)),
        'workers': number_of_GPU_workers,
        'timeout': 300
    }
    StandaloneApplication(app, options).run()


init()

The route I am using:

@app.route("/api/work", methods=["POST"])
def work():
    try:
        body = request.get_json()
        if app.worker is None:
            app.worker = worker()
            app.worker.load_models() 
        ...

And here it throws the exception:

2020-04-09 11:33:33,544 loading file /mnt/models/best-model
Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Command that I am using:

sudo docker run    -p 8889:8888   -e MODELSLOCATION=/mnt/models   --gpus all -v $MODELSLOCATION:/mnt/models cc14ffc68256

Solution

For torch 1.4.0, the solution that works for me is the following. You need to launch the flask app in the separate function init.

What is most important - you need to put import torch inside this function and remove any occurrences of if from the flask launch file. Torch 1.4.0 has some issues with multiprocessing.

def init():
    if __name__ == '__main__':
        import torch
        parser = argparse.ArgumentParser()
        parser.add_argument("--test", action='store_true')
        parser.add_argument("--workers", type=int, default=1)
        args = parser.parse_args()
        torch.multiprocessing.set_start_method('spawn')
        if check_test_mode(args.test):
            number_of_GPU_workers = args.workers or 1
        options = {
            'bind': '%s:%s' % ('0.0.0.0', str(port)),
            'workers': number_of_GPU_workers,
            'timeout': 900
        }
        StandaloneApplication(app, options).run()

init()