Search code examples
pythondjangonginxgunicorndigital-ocean

Long Script Stops Running When Deployed - Django on nginx/gunicorn


I have a very long script that ingests a pdf, does a lot of processing, then returns a result. It runs perfectly when running through port 8000 through either

python manage.py runserver 0.0.0.0:8000

or

gunicorn --bind 0.0.0.0:8000 myproject.wsgi

However when I run it via port 80 in "production" the script stops running at a certain point with no errors and seemingly no holes in the logic. What's really causing confusion is that it stops in different places depending on the length/complexity of the processed document. Short/simple ones complete with no issue but a longer one will stop in the middle.

I tried adding a very detailed log file to debug the issue. If I process one document, it stops running in the same loop but at different places within the loop (seemingly random), indicating that this isn't a logical flaw (note I'm writing and flushing). Furthermore, if I use a longer/more complex document it mysteriously stops earlier in the process.

I'm deploying this using Django via gunicorn/nginx on DigitalOcean

Is there some sort of built in protection that stops processes after a certain number of CPU cycles or time as protection against infinite loops in any of the above? That's the only thing that I can think of because I'm otherwise out of ideas.

I'd really appreciate any help!


Solution

  • Figured it out. Gunicorn has a built in timer that kills workers after a set amount of time. The default (30 seconds per gunicorn's documentation) was too short for my process. To solve, add the "timeout" variable in "ExecStart" in the gunicorn configuration file; standard setup on Ubuntu 20.4:

    sudo nano /etc/systemd/system/gunicorn.service
    

    then add the timeout variable to the ExecStart (I used 120 seconds in this example):

    ExecStart=/home/sammy/myprojectdir/myprojectenv/bin/gunicorn \
              --access-logfile - \
              --workers 3 \
              --timeout 120 \
              --bind unix:/run/gunicorn.sock \
              myproject.wsgi:application
    

    I determined this by looking at the "journalctl", which records the stdout. To view the most recent 50 lines of the stream, enter the following into your terminal:

    journaltctl | tail -50
    

    In my case, I noticed an entry containing "[CRITICAL] WORKER TIMEOUT (pid:xxxxxx)"