Search code examples
dockergoogle-app-enginegoogle-cloud-platformgeodjango

Deploying Django to App Engine Flexible Environment - Timeout Error Response: [4]


I am attempting to deploy my App to Flexible environment. The Docker image builds fine but the process fails when I think it is attempting to make the service go live. My build timeout is set to 1200 for what it's worth.

How do I interrogate this error further? I am struggling to find where in the logs / GCP system I could find out exactly what process it is that is getting stuck. This seems to be a completely opaque error with no indication of what exactly is going wrong. Is it that there is some error in the application (which runs fine locally)? If so, i would expect that it would still be deploying, but just showing the error when I accessed the website.

Any help greatly appreciated.

Error:

OperationError: Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.

This is my Dockerfile:

FROM gcr.io/google-appengine/python

RUN apt-get update && apt-get install software-properties-common -y
RUN add-apt-repository ppa:ubuntugis/ppa

RUN apt-get install -y gdal-bin


# Create a virtualenv for dependencies. This isolates these packages from
# system-level packages.
# Use -p python3 or -p python3.7 to select python version. Default is version 2.
RUN virtualenv /env -p python3.7



# Setting these environment variables are the same as running
# source /env/bin/activate.
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH

# Copy the application's requirements.txt and run pip to install all
# dependencies into the virtualenv
COPY requirements.txt /tmp
WORKDIR /tmp
RUN pip install -r requirements.txt

# Add the application source code.
ADD . /

EXPOSE 8080
# Run a WSGI server to serve the application. gunicorn must be declared as
# a dependency in requirements.txt.
#CMD gunicorn -b :$PORT main:app

And this is my app.yaml:

runtime: custom
env: flex

runtime_config:
  # You can also specify 2 for Python 2.7
  python_version: 3.7

Solution

  • I'm almost certain this is caused by gunicorn timing out.

    To disable the timeout behavior of gunicorn change the last command in your Dockerfile to:

    CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app --timeout 0
    

    Where: -- workers 1 --threads 8 means one worker process and 8 threads. (If you do not specify the resources manually the default will be 1 CPU core) If you decide to use more cores then change the workers and threads accordingly but this is a bit out-of-scope for this question.

    The important part is the --timeout 0 where it basically prevents gunicorn from timing out.

    If you still see the error, then there's one small addition that will most likely fix it. Use --preload flag as well when starting gunicorn. So the last command in the Dockerfile will be:

    CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app --timeout 0 --preload
    

    This will basically make sure all imports and preprocessing needed will be done upon the creation of the instance hosting your docker container. This is really useful when you are working with apps that take a lot of time to do some 1-time preproccessing. This way, once a request comes, everything is already loaded and ready to serve that request.

    To maximize the benefits of --preload, you need to also move all imports etc to the very beginning of your main app and avoid calling the imports inside route handlers.

    Also, there's no point in having the entrypoint command in both the app.yaml and in the Dockerfile. In my opinion it's better to keep it in the Dockerfile.

    Additionally:

    I'd move the EXPOSE 8080 to right after the FROM line as it will ensure your container has proper connection to the outside world.