I'm running a worker on Celery [version 5.1.0 (sun-harmonics)] like this:
celery -A task_scheduler.celery_task worker --loglevel=debug -n ekkis -E
the worker seems to run fine and responds to requests. here's a partial log:
[2022-11-06 03:02:15,913: DEBUG/MainProcess] | Worker: Preparing bootsteps. [2022-11-06 03:02:15,918: DEBUG/MainProcess] | Worker: Building graph... [2022-11-06 03:02:15,919: DEBUG/MainProcess] | Worker: New boot order: {StateDB, Beat, Timer, Hub, Pool, Autoscaler, Consumer} [2022-11-06 03:02:15,933: DEBUG/MainProcess] | Consumer: Preparing bootsteps. [2022-11-06 03:02:15,934: DEBUG/MainProcess] | Consumer: Building graph... [2022-11-06 03:02:16,033: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Heart, Mingle, Gossip, Tasks, Control, Agent, event loop} [2022-11-06 03:02:16,034: INFO/MainProcess] LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe Started [2022-11-06 03:02:16,046: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/auth/token/lookup-self HTTP/1.1" 200 908 [2022-11-06 03:02:16,100: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/site/data/prod/site HTTP/1.1" 200 None [2022-11-06 03:02:16,105: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/auth/token/lookup-self HTTP/1.1" 200 908 [2022-11-06 03:02:16,111: DEBUG/MainProcess] http://vault.gitlab-managed-apps.svc.cluster.local:8200 "GET /v1/site/data/prod/site HTTP/1.1" 200 None
but when I try to get a status:
celery -A task_scheduler.celery_task status
it fails:
Error: No nodes replied within time constraint
I've googled a lot and there's precious little out there that can help. We're running Celery with a Redis (v7.0.5) backend. any help on how to troubleshoot would be greatly appreciated
I'm expecting to see a list of the worker nodes
Addendum I
it appears report
works and I get something like this:
software -> celery:5.1.0 (sun-harmonics) kombu:5.1.0 py:3.6.15
billiard:3.6.4.0 redis:3.5.3
platform -> system:Linux arch:64bit
kernel version:5.4.188+ imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:redis results:disabled
ok, I figured it out. we had configured a task:
@celeryd_after_setup.connect()
def configure_task(conf=None, **kwargs):
genesis_subscribe()
which, due to a failure in a foreign system, never returned:
def genesis_subscribe():
try:
from task_scheduler.third_party.backfilling import BackFilling
from task_scheduler.constants import GENESIS
genesis_backfilling = BackFilling.factory(GENESIS)
while True:
if genesis_backfilling.subscribe():
break
else:
time.sleep(90)
logger.info("LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe FAILURE")
logger.info("LIQUIDITY PROVISION TASK_SCHEDULER - Genesis Subscribe Completed")
except Exception as e:
CommonUtils.basic_exception_logger(e)
logger.error(str(e))
so this implementation would sleep on failure ad infinitum, which I guess meant the worker could never get started as it was just waiting. once that code was removed I can now get a proper status
from celery