Search code examples
djangopython-2.7nginxgunicorngevent

Out of memory: Kill process (gunicorn) score or sacrifice child


I'm running a Django 1.8 project on a DigitalOcean VPS (512 MB RAM, 1 CPU, 20GB SSD). I have Nginx proxying traffic to gunicorn. Here is my gunicorn command (run via supervisor):

gunicorn my_web_app.wsgi:application --worker-class gevent --bind 127.0.0.1:8001 --timeout=1200

I noticed that when I upload a ~3-5 MB image to my web app, the gunicorn worker crashes with this error:

Jan 16 12:39:46 dev-1 kernel: [663264.917312] Out of memory: Kill process 31093 (gunicorn) score 589 or sacrifice child
Jan 16 12:39:46 dev-1 kernel: [663264.917416] Killed process 31093 (gunicorn) total-vm:560020kB, anon-rss:294888kB, file-rss:8kB

I monitored the output from top, which shows the memory usage steadily increasing:

Top output before I upload the image (baseline):

top - 13:19:45 up 7 days, 16:54,  2 users,  load average: 0.00, 0.03, 0.05
Tasks:  96 total,   1 running,  95 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.2 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    501780 total,   298384 used,   203396 free,    17112 buffers
KiB Swap:        0 total,        0 used,        0 free.    72048 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                   
 902 my_user   20   0  145336  32332   3900 S  0.0  6.4   0:01.00 gunicorn

About a minute into the upload (half way mark):

top - 13:22:00 up 7 days, 16:56,  2 users,  load average: 0.05, 0.03, 0.05
Tasks:  98 total,   1 running,  97 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.2 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    501780 total,   313976 used,   187804 free,    18100 buffers
KiB Swap:        0 total,        0 used,        0 free.    77196 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                   
  902 my_user   20   0  171492  40076   5352 S  0.0  8.0   0:01.33 gunicorn

Moments before gunicorn crashing:

top - 13:23:14 up 7 days, 16:57,  2 users,  load average: 0.19, 0.07, 0.06
Tasks:  99 total,   3 running,  96 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.2 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    501780 total,   341836 used,   159944 free,    18236 buffers
KiB Swap:        0 total,        0 used,        0 free.    90228 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                   
  902 my_user   20   0  239184  52492   5568 R 80.9 10.5   0:01.65 gunicorn

And finally, the moment of the crash:

top - 13:23:15 up 7 days, 16:57,  2 users,  load average: 0.19, 0.07, 0.06
Tasks:  99 total,   4 running,  95 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.2 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    501780 total,   495800 used,     5980 free,      176 buffers
KiB Swap:        0 total,        0 used,        0 free.    31564 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                   
  902 my_user   20   0  545520 284012   5264 R 80.1 56.6   0:02.74 gunicorn

What could be causing the gunicorn worker to be spiking in its memory usage of > 200MB, even though I'm only uploading a < 5MB file?


Solution

  • What is the Gunicorn server doing with the image once it receives it? I once an encountered a specific corrupt JPEG image that triggered a memory leak in (now old) version of ImageMagick which eventually crashed the servers.

    You can test if the issue is Gunicorn vs your code by creating an CLI entry point into the same code that accepts the file as a command line argument. If you still get the memory leak loading the file from the CLI, the issue is with your application code.

    If you don't get the memory leak in your CLI test, the issue is with your Gunicorn configuration.

    Also, go ahead add some swap to the machine as a safety buffer. I would add 1 Gig. Instructions for adding swap on Digital Ocean. Just doing this may allow the task to finish and is good practice anyway.