Error: No nodes replied within time constraint

I'm running a worker on Celery [version 5.1.0 (sun-harmonics)] like this:

celery -A task_scheduler.celery_task worker --loglevel=debug -n ekkis -E

the worker seems to run fine and responds to requests. here's a partial log:

[2022-11-06 03:02:15,913: DEBUG/MainProcess] | Worker: Preparing bootsteps. [2022-11-06 03:02:15,918: DEBUG/MainProcess] | Worker: Building graph... [2022-11-06 03:02:15,919: DEBUG/MainProcess] | Worker: New boot order: {StateDB, Beat, Timer, Hub, Pool, Autoscaler, Consumer} [2022-11-06 03:02:15,933: DEBUG/MainProcess] | Consumer: Preparing bootsteps. [2022-11-06 03:02:15,934: DEBUG/MainProcess] | Consumer: Building graph... [2022-11-06 03:02:16,033: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Heart, Mingle, Gossip, Tasks, Control, Agent, event loop} [2022-11-06 03:02:16,034: INFO/MainProcess] LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe Started [2022-11-06 03:02:16,046: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/auth/token/lookup-self HTTP/1.1" 200 908 [2022-11-06 03:02:16,100: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/site/data/prod/site HTTP/1.1" 200 None [2022-11-06 03:02:16,105: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/auth/token/lookup-self HTTP/1.1" 200 908 [2022-11-06 03:02:16,111: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/site/data/prod/site HTTP/1.1" 200 None

but when I try to get a status:

celery -A task_scheduler.celery_task status

it fails:

Error: No nodes replied within time constraint

I've googled a lot and there's precious little out there that can help. We're running Celery with a Redis (v7.0.5) backend. any help on how to troubleshoot would be greatly appreciated

I'm expecting to see a list of the worker nodes

Addendum I

it appears report works and I get something like this:

software -> celery:5.1.0 (sun-harmonics) kombu:5.1.0 py:3.6.15
            billiard:3.6.4.0 redis:3.5.3
platform -> system:Linux arch:64bit
            kernel version:5.4.188+ imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:disabled

Solution

ok, I figured it out. we had configured a task:

@celeryd_after_setup.connect()
def configure_task(conf=None, **kwargs):
    genesis_subscribe()

which, due to a failure in a foreign system, never returned:

def genesis_subscribe():
    try:
        from task_scheduler.third_party.backfilling import BackFilling
        from task_scheduler.constants import GENESIS
        genesis_backfilling = BackFilling.factory(GENESIS)

        while True:
            if genesis_backfilling.subscribe():
                break
            else:
                time.sleep(90)
                logger.info("LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe FAILURE")

        logger.info("LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe Completed")
    except Exception as e:
        CommonUtils.basic_exception_logger(e)
        logger.error(str(e))

so this implementation would sleep on failure ad infinitum, which I guess meant the worker could never get started as it was just waiting. once that code was removed I can now get a proper status from celery