I'm working on a data processing routine in Celery with a Redis backend and broker. Many workers (~200) interact with a broker to get tasks and execute those tasks. However, my workers are all sending heartbeat signals to one another, which populates their logs with all sorts of cruft like this:
[2018-05-13 15:38:00,737: INFO/MainProcess] missed heartbeat from celery@d12chas387.crc.nd.edu
[2018-05-13 15:38:00,737: INFO/MainProcess] missed heartbeat from celery@d12chas530.crc.nd.edu
[2018-05-13 15:38:00,737: INFO/MainProcess] missed heartbeat from celery@d12chas531.crc.nd.edu
[2018-05-13 15:38:00,738: INFO/MainProcess] missed heartbeat from celery@d12chas351.crc.nd.edu
[2018-05-13 15:38:00,738: INFO/MainProcess] missed heartbeat from celery@d12chas515.crc.nd.edu
[2018-05-13 15:38:00,739: INFO/MainProcess] missed heartbeat from celery@d12chas492.crc.nd.edu
The workers should never interface with each other directly, they should all get the information they need from the broker. Is it possible to disable heartbeats between worker nodes? If so, is this a bad idea for some reason I'm not yet seeing?
You can try running your workers with --without-gossip to prevent this from happening. Starting in Celery 3.1, workers became passively subscribed to other worker events like heartbeats.
Gossip was added to allow celery users to take advantage of worker communication, like rerouting tasks to the best worker, but it is fine to disable this if there is no reason for the workers to communicate. You can read more about what worker gossip is/why it was introduced here: Celery 3.1 What's New.
You might also add the --without-mingle option to disable worker synchronization on startup.
Also, this question seems related: celery missed heartbeat (on_node_lost)