Search code examples
cadence-workflowtemporal-workflowuber-cadence

Scaling limitation of workflow workers due to continuous polling in Uber Cadece?


I am evaluating cadence for implementing our business orchestration. I understand that the workers continuously poll the task list for tasks to execute. My concern here is that will it cause any scale problems? The worker is always busy and continuously polling some database, along with this it also needs to execute the business logic so is there a possibility that it runs out of resources and then crashes or drops the tasks to execute?

How does this polling mechanism scale when we have millions of workflows? Will it cause delays in executing the workflow code, when we have millions of tasks in the task list?


Solution

  • Cadence and Temporal use long polling over gRPC to listen to task queues. So if there are no messages in the queues the poll requests return once per minute. This way workers don't consume excessive resources due to polling. Also, most poll calls never cause a call to the database due to various optimizations the matching service implements.

    The number of open workflows doesn't affect polling performance at all as many of these workflows can be passively waiting on a timer on an external event. The number of operations per second that workflows execute defines how many tasks have to be delivered to workers. If the cluster and workers are provisioned correctly then even a high rate of tasks shouldn't cause any issues.