Search code examples
djangoamazon-web-servicesceleryamazon-sqsdjango-celery

How to configure celery with SQS as backend?


I'm trying to setup a SQS broker with a celery app, configured in a django project. Here's my setup:

celery.py:

import os

from celery import Celery

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "core.settings")

app = Celery("my-app")

app.config_from_object("django.conf:settings", namespace="CELERY")

app.autodiscover_tasks()

My celery related django settings:

CELERY_BROKER_TRANSPORT_OPTIONS = {
    "region": "eu-west-1",
    "queue_name_prefix": f"celery-",
}
CELERY_BROKER_URL = "sqs://"
CELERY_TASK_DEFAULT_QUEUE = "default"

This configuration works fine using a rabbitmq broker, which makes me think the overall configuration is correct, but when I use SQS as broker, the messages are sent to SQS (the "Message available" counter increases), but once picked up by my worker, they go to "Messages in flight" and stay there forever it seems (at least for hours). Also, I can see the worker logs doing things, but never actually executes the task for some reason.

Here are some celery worker logs:

 host;x-amz-date
 786cb490d758593ebf5a6e0c0b34cf025b3309bb0d777891344fd32bd01cb61b
 [2023-11-19 19:52:05,766: DEBUG/MainProcess] StringToSign:
 AWS4-HMAC-SHA256
 20231119T195205Z
 20231119/eu-west-1/sqs/aws4_request
 32a2b642b24ffcd3f42216826611c27799efb1cf756b26e7119d5553009f527c
 [2023-11-19 19:52:05,766: DEBUG/MainProcess] Signature:
 93bcf47703548603fa0787fbbd2ae8aa4b54460db91695f4f6c818b9141b620e
 [2023-11-19 19:52:10,959: DEBUG/MainProcess] Response headers: {}
 [2023-11-19 19:52:10,960: DEBUG/MainProcess] Response body:
 b'<?xml version="1.0"?><ReceiveMessageResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ReceiveMessageResult><Message><MessageId>c4e1b404-.....-c046dc4aeb60</MessageId><ReceiptHandle>AQEBy+4G0x9BF8su10zFWJQuyEXJOFSxF........cRPUdk/vjWzirMiBw97ZSG44M=</ReceiptHandle><MD5OfBody>483202aa2938b...da1b024b63</MD5OfBody><Body>eyJib2R5IjogIlcxdGRMQ0I3...CJyZXBseV90byI6ICJmOWZjY2FmDIyMTBkYTAyMiJ9fQ==</Body><Attribute><Name>ApproximateReceiveCount</Name><Value>1</Value></Attribute></Message></ReceiveMessageResult><ResponseMetadata><RequestId>28a76707-9a01...d3f3a42ad7</RequestId></ResponseMetadata></ReceiveMessageResponse>'
 [2023-11-19 19:52:10,960: DEBUG/MainProcess] Event choose-signer.sqs.ReceiveMessage: calling handler <function set_operation_specific_signer at 0xffff9b901750>
 [2023-11-19 19:52:10,960: DEBUG/MainProcess] Calculating signature using v4 auth.
 [2023-11-19 19:52:10,960: DEBUG/MainProcess] CanonicalRequest:
 POST
 /ACCOUNT_ID/celery-default


 host:sqs.eu-west-1.amazonaws.com
 x-amz-date:20231119T195210Z


 host;x-amz-date
 786cb490d758593ebf5a6e...d777891344fd32bd01cb61b
 [2023-11-19 19:52:10,961: DEBUG/MainProcess] StringToSign:
 AWS4-HMAC-SHA256
 20231119T195210Z
 20231119/eu-west-1/sqs/aws4_request
 a6b5c2bb4cb9eef89...4e16bc164d0a0d9b1a1
 [2023-11-19 19:52:10,961: DEBUG/MainProcess] Signature:
 81bd2621f3ce88b5a76...79dc18951c3bc0c8a
 [2023-11-19 19:52:21,000: DEBUG/MainProcess] Response headers: {}
 [2023-11-19 19:52:21,000: DEBUG/MainProcess] Response body:
 b'<?xml version="1.0"?><ReceiveMessageResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ReceiveMessageResult/><ResponseMetadata><RequestId>802bb8a8-...-e57c9a538959</RequestId></ResponseMetadata></ReceiveMessageResponse>'
 [2023-11-19 19:52:21,000: DEBUG/MainProcess] Event choose-signer.sqs.ReceiveMessage: calling handler <function set_operation_specific_signer at 0xffff9b901750>
 [2023-11-19 19:52:21,001: DEBUG/MainProcess] Calculating signature using v4 auth.
 [2023-11-19 19:52:21,001: DEBUG/MainProcess] CanonicalRequest:
 POST
 /ACCOUNT_ID/celery-default

I don't think it's a permission issue, as I can reproduce the error either on AWS (fargate) using an execution role, or locally using an admin user, with same result in both cases.

Do you have any idea what's wrong please?

  • Tried to check worker logs (as shown above)
  • Tried to check AWS cloudtrail, but couldn't find anything usefull
  • Tried to switch to a rabbitMQ broker, which fixes the problem.
  • Tried locally and on AWS, to eliminate the stack issue, and the permissions issue

Solution

  • In the end, updating celery (and kombu) fixed the issue!

    Source: github.com/celery/celery/pull/8646