python django google-cloud-platform postgis app-engine-flexible

Google App Engine instances behave differently with the exact same API call

I have a Django application with a DRF API deployed in a flexible environment on Google App Engine. I am using PostgreSQL with the PostGIS extension. After deployment, I have two instances running.

I have an API that makes us of GeoDjango to retrieve certain locations from my database. With the exact same API call, it fails about 50% of the time. As I can see from the GCP Console Logs, one of the GAE instance always work while the other systematically returns a 500 with the following error:

ImportError: Could not find the GEOS library (tried "geos_c", "GEOS"). 
Try setting GEOS_LIBRARY_PATH in your settings.

However, the GEOS library is installed (see Dockerfile below). Any idea on why the two instances behave differently and what can I do to prevent that?

Dockerfile

# [START dockerfile]
FROM gcr.io/google_appengine/python

# Install libraries
RUN apt-get update && apt-get install -y \
  binutils \
  libproj-dev \
  gdal-bin \
  python-gdal

# Change the -p argument to use Python 2.7 if desired.
RUN virtualenv /env -p python3.6

# Set virtualenv environment variables. This is equivalent to running
# source /env/bin/activate.
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH

ADD requirements.txt /app/
RUN pip install -r requirements.txt
ADD . /app/

CMD gunicorn -b :$PORT nlp.wsgi
# [END dockerfile]

app.yaml

# [START runtime]
runtime: custom
env: flex
entrypoint: gunicorn -b :$PORT nlp.wsgi

runtime_config:
  python_version: 3
# [END runtime]

Solution

The issue was actually completely unrelated to the GEOS library. The instance was running out of RAM. I solved the issue by simply increasing the resource size in app.yaml and re-deploy:

resources:
  cpu: 1
  memory_gb: 2
  disk_size_gb: 10