Search code examples
djangoamazon-web-servicesceleryamazon-elastic-beanstalkdjango-celery

How to run a celery worker with Django app scalable by AWS Elastic Beanstalk?


How to use Django with AWS Elastic Beanstalk that would also run tasks by celery on main node only?


Solution

  • This is how I set up celery with django on elastic beanstalk with scalability working fine.

    Please keep in mind that 'leader_only' option for container_commands works only on environment rebuild or deployment of the App. If service works long enough, leader node may be removed by Elastic Beanstalk. To deal with that, you may have to apply instance protection for your leader node. Check: http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html#instance-protection-instance

    Add bash script for celery worker and beat configuration.

    Add file root_folder/.ebextensions/files/celery_configuration.txt:

    #!/usr/bin/env bash
    
    # Get django environment variables
    celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`
    celeryenv=${celeryenv%?}
    
    # Create celery configuraiton script
    celeryconf="[program:celeryd-worker]
    ; Set full path to celery program if using virtualenv
    command=/opt/python/run/venv/bin/celery worker -A django_app --loglevel=INFO
    
    directory=/opt/python/current/app
    user=nobody
    numprocs=1
    stdout_logfile=/var/log/celery-worker.log
    stderr_logfile=/var/log/celery-worker.log
    autostart=true
    autorestart=true
    startsecs=10
    
    ; Need to wait for currently executing tasks to finish at shutdown.
    ; Increase this if you have very long running tasks.
    stopwaitsecs = 600
    
    ; When resorting to send SIGKILL to the program to terminate it
    ; send SIGKILL to its whole process group instead,
    ; taking care of its children as well.
    killasgroup=true
    
    ; if rabbitmq is supervised, set its priority higher
    ; so it starts first
    priority=998
    
    environment=$celeryenv
    
    [program:celeryd-beat]
    ; Set full path to celery program if using virtualenv
    command=/opt/python/run/venv/bin/celery beat -A django_app --loglevel=INFO --workdir=/tmp -S django --pidfile /tmp/celerybeat.pid
    
    directory=/opt/python/current/app
    user=nobody
    numprocs=1
    stdout_logfile=/var/log/celery-beat.log
    stderr_logfile=/var/log/celery-beat.log
    autostart=true
    autorestart=true
    startsecs=10
    
    ; Need to wait for currently executing tasks to finish at shutdown.
    ; Increase this if you have very long running tasks.
    stopwaitsecs = 600
    
    ; When resorting to send SIGKILL to the program to terminate it
    ; send SIGKILL to its whole process group instead,
    ; taking care of its children as well.
    killasgroup=true
    
    ; if rabbitmq is supervised, set its priority higher
    ; so it starts first
    priority=998
    
    environment=$celeryenv"
    
    # Create the celery supervisord conf script
    echo "$celeryconf" | tee /opt/python/etc/celery.conf
    
    # Add configuration script to supervisord conf (if not there already)
    if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
      then
      echo "[include]" | tee -a /opt/python/etc/supervisord.conf
      echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
    fi
    
    # Reread the supervisord config
    supervisorctl -c /opt/python/etc/supervisord.conf reread
    
    # Update supervisord in cache without restarting all services
    supervisorctl -c /opt/python/etc/supervisord.conf update
    
    # Start/Restart celeryd through supervisord
    supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat
    supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker
    

    Take care about script execution during deployment, but only on main node (leader_only: true). Add file root_folder/.ebextensions/02-python.config:

    container_commands:
      04_celery_tasks:
        command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
        leader_only: true
      05_celery_tasks_run:
        command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
        leader_only: true
    

    File requirements.txt

    celery==4.0.0
    django_celery_beat==1.0.1
    django_celery_results==1.0.1
    pycurl==7.43.0 --global-option="--with-nss"
    

    Configure celery for Amazon SQS broker (Get your desired endpoint from list: http://docs.aws.amazon.com/general/latest/gr/rande.html) root_folder/django_app/settings.py:

    ...
    CELERY_RESULT_BACKEND = 'django-db'
    CELERY_BROKER_URL = 'sqs://%s:%s@' % (aws_access_key_id, aws_secret_access_key)
    # Due to error on lib region N Virginia is used temporarily. please set it on Ireland "eu-west-1" after fix.
    CELERY_BROKER_TRANSPORT_OPTIONS = {
        "region": "eu-west-1",
        'queue_name_prefix': 'django_app-%s-' % os.environ.get('APP_ENV', 'dev'),
        'visibility_timeout': 360,
        'polling_interval': 1
    }
    ...
    

    Celery configuration for django django_app app

    Add file root_folder/django_app/celery.py:

    from __future__ import absolute_import, unicode_literals
    import os
    from celery import Celery
    
    # set the default Django settings module for the 'celery' program.
    os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'django_app.settings')
    
    app = Celery('django_app')
    
    # Using a string here means the worker don't have to serialize
    # the configuration object to child processes.
    # - namespace='CELERY' means all celery-related configuration keys
    #   should have a `CELERY_` prefix.
    app.config_from_object('django.conf:settings', namespace='CELERY')
    
    # Load task modules from all registered Django app configs.
    app.autodiscover_tasks()
    

    Modify file root_folder/django_app/__init__.py:

    from __future__ import absolute_import, unicode_literals
    
    # This will make sure the app is always imported when
    # Django starts so that shared_task will use this app.
    from django_app.celery import app as celery_app
    
    __all__ = ['celery_app']
    

    Check also: