Search code examples
pythondjangorabbitmqcelery

Slow Celery Task Times


I'm using Django, Celery and RabbitMQ. I have a simple task that sends emails. This task works, but its very slow.

For example, I send 5000 emails, all 5000 emails go straight to RabbitMQ as normal but once in the message broker it then proceeds to takes around 30 minutes to complete and clear all tasks.

Without Celery these same tasks would take just a few minutes to process all 5000 tasks.

Have I missed configured something? It would be very helpful if someone could spot my speed issue.

task.py

class SendMessage(Task):
    name = "Sending SMS"
    max_retries = 10
    default_retry_delay = 3

    def run(self, message_id, gateway_id=None, **kwargs):
        logging.debug("About to send a message.")


        try:
            message = Message.objects.get(pk=message_id)
        except Exception as exc:
            raise SendMessage.retry(exc=exc)

        if not gateway_id:
            if hasattr(message.billee, 'sms_gateway'):
                gateway = message.billee.sms_gateway
            else:
                gateway = Gateway.objects.all()[0]
        else:
            gateway = Gateway.objects.get(pk=gateway_id)

        account = Account.objects.get(user=message.sender)
        if account._balance() >= message.length:
            response = gateway._send(message)

            if response.status == 'Sent':
                # Take credit from users account.
                transaction = Transaction(
                    account=account,
                    amount=- message.charge,

                )
                transaction.save()
                message.billed = True
                message.save()
        else:
            pass

settings.py

# Celery
BROKER_URL = 'amqp://admin:[email protected]:5672//'
CELERY_SEND_TASK_ERROR_EMAILS = True

Apache config

<VirtualHost *:80>
ServerName www.domain.com

DocumentRoot /srv/project/domain


WSGIDaemonProcess domain.com processes=2 threads=15 display-name=%{GROUP}
WSGIProcessGroup domain.com

WSGIScriptAlias / /srv/project/domain/apache/django.wsgi
ErrorLog /srv/project/logs/error.log
</VirtualHost>

conf

# Name of nodes to start, here we have a single node
#CELERYD_NODES="w1"
# or we could have three nodes:
CELERYD_NODES="w1 w2 w3"

# Where to chdir at start.
CELERYD_CHDIR="/srv/project/domain"

# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"

# How to call "manage.py celeryctl"
CELERYCTL="$CELERYD_CHDIR/manage.py celeryctl"

# Extra arguments to celeryd
CELERYD_OPTS="--time-limit=900 --concurrency=8"

# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/srv/project/logs/celery/%n.log"
CELERYD_PID_FILE="/srv/project/celery/%n.pid"

# Workers should run as an unprivileged user.
CELERYD_USER="root"
CELERYD_GROUP="root"

# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="domain.settings"

# Celery Beat Settings.

# Where to chdir at start.
CELERYBEAT_CHDIR="/srv/project/domain"

# Path to celerybeat
CELERYBEAT="$CELERYBEAT_CHDIR/manage.py celerybeat"

Solution

  • You are processing ~2.78 tasks/second (5000 tasks in 30 mins) which I can agree isn't that high. You have 3 nodes each running with a concurrency of 8 so you should be able to process 24 tasks in parallel.

    Things to check:

    CELERYD_PREFETCH_MULTIPLIER - This is set to 4 by default but if you have lots of short tasks it can be worthwhile to increase it. It will reduce the impact of the time to take the messages from the broker at the cost that tasks will not be as evenly distributed across workers.

    DB connection/queries - I count 5+ DB queries being executed for the successful case. If you are using the default result backend for django-celery there are additional queries for storing the task result in the DB. django-celery will also close and reopen the DB connection after each task which adds some overhead. If you have 5 queries and each one takes 100ms then your task will take at least 500ms with or without celery. Running the queries by themselves is one thing but you also need to ensure that nothing in your task is locking the table/rows preventing other tasks from running efficiently in parallel.

    Gateway response times - Your task appears to call a remote service which I'm assuming is an SMS gateway. If that server is slow to respond then your task will be slow. Again the response times might be different for a single call vs when you are doing this at peak load. In the US, long-code SMS can only be sent at a rate of 1 per second and depending on where the gateway is doing that rate-limiting then it might be slowing down your task.