Search code examples
pythonrediscelery

Error: No nodes replied within time constraint


I'm running a worker on Celery [version 5.1.0 (sun-harmonics)] like this:

celery -A task_scheduler.celery_task worker --loglevel=debug -n ekkis -E

the worker seems to run fine and responds to requests. here's a partial log:

[2022-11-06 03:02:15,913: DEBUG/MainProcess] | Worker: Preparing bootsteps. [2022-11-06 03:02:15,918: DEBUG/MainProcess] | Worker: Building graph... [2022-11-06 03:02:15,919: DEBUG/MainProcess] | Worker: New boot order: {StateDB, Beat, Timer, Hub, Pool, Autoscaler, Consumer} [2022-11-06 03:02:15,933: DEBUG/MainProcess] | Consumer: Preparing bootsteps. [2022-11-06 03:02:15,934: DEBUG/MainProcess] | Consumer: Building graph... [2022-11-06 03:02:16,033: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Heart, Mingle, Gossip, Tasks, Control, Agent, event loop} [2022-11-06 03:02:16,034: INFO/MainProcess] LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe Started [2022-11-06 03:02:16,046: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/auth/token/lookup-self HTTP/1.1" 200 908 [2022-11-06 03:02:16,100: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/site/data/prod/site HTTP/1.1" 200 None [2022-11-06 03:02:16,105: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/auth/token/lookup-self HTTP/1.1" 200 908 [2022-11-06 03:02:16,111: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/site/data/prod/site HTTP/1.1" 200 None

but when I try to get a status:

celery -A task_scheduler.celery_task status

it fails:

Error: No nodes replied within time constraint

I've googled a lot and there's precious little out there that can help. We're running Celery with a Redis (v7.0.5) backend. any help on how to troubleshoot would be greatly appreciated

I'm expecting to see a list of the worker nodes

Addendum I

it appears report works and I get something like this:

software -> celery:5.1.0 (sun-harmonics) kombu:5.1.0 py:3.6.15
            billiard:3.6.4.0 redis:3.5.3
platform -> system:Linux arch:64bit
            kernel version:5.4.188+ imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:disabled

Solution

  • ok, I figured it out. we had configured a task:

    @celeryd_after_setup.connect()
    def configure_task(conf=None, **kwargs):
        genesis_subscribe()
    

    which, due to a failure in a foreign system, never returned:

    def genesis_subscribe():
        try:
            from task_scheduler.third_party.backfilling import BackFilling
            from task_scheduler.constants import GENESIS
            genesis_backfilling = BackFilling.factory(GENESIS)
    
            while True:
                if genesis_backfilling.subscribe():
                    break
                else:
                    time.sleep(90)
                    logger.info("LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe FAILURE")
    
            logger.info("LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe Completed")
        except Exception as e:
            CommonUtils.basic_exception_logger(e)
            logger.error(str(e))
    

    so this implementation would sleep on failure ad infinitum, which I guess meant the worker could never get started as it was just waiting. once that code was removed I can now get a proper status from celery