django google-app-engine google-app-engine-python

Google app engine Django app is crashing intermittently but the log files aren't showing me why

I have a Django 4.0 application running on google app engine, and for the most part it works fine. However I have a particular page which seems to crash the application after I load the page several times. On my laptop I don't see this behavior, so I'm trying to debug what is going wrong when it is running on GAE but I don't have much visibility into what is happening. Watching the logs doesn't tell me anything interesting, just that the workers are shutting down and then that they are restarting:

gcloud app logs tail -s default

2022-01-26 16:02:38 default[fixeddev]  2022-01-26 08:02:38,933 common.views INFO     Application started
2022-01-26 16:03:40 default[fixeddev]  "GET /organization/clean_up_issues/ HTTP/1.1" 200
2022-01-26 16:03:56 default[fixeddev]  "GET /organization/clean_up_issues/ HTTP/1.1" 200
2022-01-26 16:04:10 default[fixeddev]  "GET /organization/clean_up_issues/ HTTP/1.1" 500
2022-01-26 16:04:15 default[fixeddev]  [2022-01-26 16:04:15 +0000] [12] [INFO] Handling signal: term
2022-01-26 16:04:15 default[fixeddev]  [2022-01-26 08:04:15 -0800] [22] [INFO] Worker exiting (pid: 22)
2022-01-26 16:04:15 default[fixeddev]  [2022-01-26 08:04:15 -0800] [25] [INFO] Worker exiting (pid: 25)
2022-01-26 16:04:15 default[fixeddev]  [2022-01-26 08:04:15 -0800] [27] [INFO] Worker exiting (pid: 27)
2022-01-26 16:09:49 default[fixeddev]  "GET /_ah/start HTTP/1.1" 200
2022-01-26 16:09:49 default[fixeddev]  [2022-01-26 16:09:49 +0000] [10] [INFO] Starting gunicorn 20.1.0
2022-01-26 16:09:49 default[fixeddev]  [2022-01-26 16:09:49 +0000] [10] [INFO] Listening at: http://0.0.0.0:8081 (10)
2022-01-26 16:09:49 default[fixeddev]  [2022-01-26 16:09:49 +0000] [10] [INFO] Using worker: gthread
2022-01-26 16:09:49 default[fixeddev]  [2022-01-26 16:09:49 +0000] [21] [INFO] Booting worker with pid: 21
2022-01-26 16:09:49 default[fixeddev]  [2022-01-26 16:09:49 +0000] [24] [INFO] Booting worker with pid: 24
2022-01-26 16:09:49 default[fixeddev]  [2022-01-26 16:09:49 +0000] [25] [INFO] Booting worker with pid: 25
2022-01-26 16:09:49 default[fixeddev]  [2022-01-26 16:09:49 +0000] [26] [INFO] Booting worker with pid: 26
2022-01-26 16:09:50 default[fixeddev]  OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
2022-01-26 16:09:50 default[fixeddev]  OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
2022-01-26 16:09:50 default[fixeddev]  OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
2022-01-26 16:09:50 default[fixeddev]  OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
2022-01-26 16:09:53 default[fixeddev]  2022-01-26 08:09:53,151 common.views INFO     Application started

Where do I go to get more visibility into what is actually happening during these crashes? I'm relatively new to GAE and feel like I'm debugging blind since I can't reproduce this issue on my local dev machine, and no exception is getting logged. Each crash just produces a 500.

Unrelated bonus round question: does anyone know how to deal with the OpenBLAS warnings? Is this a real issue or just a nuisance that I can't seem to suppress?

Solution

App engine error code 500 (Internal Server Error), almost always means that your python code threw an unhanded exception that was caught by the runtime.

Go to the App Engine dashboard and look for errors - client error, server error.

From the error message that you got, it appears that your app is using more memory than the one your instance can handle. This causes the error due to the lack of memory.

The only error you get is:

2022-01-26 16:09:50 default[fixeddev] OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k

Your app may still be functional. I recommend you to change the instance class to have more available memory and avoid the error.