I'm running a Django 1.8 project on a DigitalOcean VPS (512 MB RAM, 1 CPU, 20GB SSD). I have Nginx proxying traffic to gunicorn. Here is my gunicorn command (run via supervisor):
gunicorn my_web_app.wsgi:application --worker-class gevent --bind 127.0.0.1:8001 --timeout=1200
I noticed that when I upload a ~3-5 MB image to my web app, the gunicorn worker crashes with this error:
Jan 16 12:39:46 dev-1 kernel: [663264.917312] Out of memory: Kill process 31093 (gunicorn) score 589 or sacrifice child
Jan 16 12:39:46 dev-1 kernel: [663264.917416] Killed process 31093 (gunicorn) total-vm:560020kB, anon-rss:294888kB, file-rss:8kB
I monitored the output from top, which shows the memory usage steadily increasing:
Top output before I upload the image (baseline):
top - 13:19:45 up 7 days, 16:54, 2 users, load average: 0.00, 0.03, 0.05
Tasks: 96 total, 1 running, 95 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 0.2 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 501780 total, 298384 used, 203396 free, 17112 buffers
KiB Swap: 0 total, 0 used, 0 free. 72048 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
902 my_user 20 0 145336 32332 3900 S 0.0 6.4 0:01.00 gunicorn
About a minute into the upload (half way mark):
top - 13:22:00 up 7 days, 16:56, 2 users, load average: 0.05, 0.03, 0.05
Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 0.2 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 501780 total, 313976 used, 187804 free, 18100 buffers
KiB Swap: 0 total, 0 used, 0 free. 77196 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
902 my_user 20 0 171492 40076 5352 S 0.0 8.0 0:01.33 gunicorn
Moments before gunicorn crashing:
top - 13:23:14 up 7 days, 16:57, 2 users, load average: 0.19, 0.07, 0.06
Tasks: 99 total, 3 running, 96 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 0.2 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 501780 total, 341836 used, 159944 free, 18236 buffers
KiB Swap: 0 total, 0 used, 0 free. 90228 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
902 my_user 20 0 239184 52492 5568 R 80.9 10.5 0:01.65 gunicorn
And finally, the moment of the crash:
top - 13:23:15 up 7 days, 16:57, 2 users, load average: 0.19, 0.07, 0.06
Tasks: 99 total, 4 running, 95 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 0.2 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 501780 total, 495800 used, 5980 free, 176 buffers
KiB Swap: 0 total, 0 used, 0 free. 31564 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
902 my_user 20 0 545520 284012 5264 R 80.1 56.6 0:02.74 gunicorn
What could be causing the gunicorn worker to be spiking in its memory usage of > 200MB, even though I'm only uploading a < 5MB file?
What is the Gunicorn server doing with the image once it receives it? I once an encountered a specific corrupt JPEG image that triggered a memory leak in (now old) version of ImageMagick which eventually crashed the servers.
You can test if the issue is Gunicorn vs your code by creating an CLI entry point into the same code that accepts the file as a command line argument. If you still get the memory leak loading the file from the CLI, the issue is with your application code.
If you don't get the memory leak in your CLI test, the issue is with your Gunicorn configuration.
Also, go ahead add some swap to the machine as a safety buffer. I would add 1 Gig. Instructions for adding swap on Digital Ocean. Just doing this may allow the task to finish and is good practice anyway.