There are four different timeout options in the ActivityOptions, and two of those are mandatory without any default values: ScheduleToStartTimeout
and StartToCloseTimeout
.
What considerations should be made when selecting values for these timeouts?
As mentioned in the question, there are four different timeout options in ActivityOptions, and the differences between them may not be super clear to a new Cadence user. Let’s first briefly explain what those are:
ScheduleToStartTimeout
: This configuration specifies the maximum
duration between the time the Activity is scheduled by a workflow and
it’s picked up by an activity worker to start executing it. In other
words, it configures the time a task spends in the queue.StartToCloseTimeout
: This one specifies the maximum time taken by
an activity worker from the time it fetches a task until it reports
the completion of it to the Cadence server.ScheduleToCloseTimeout
: This configuration specifies an end-to-end
timeout duration for an activity from the time it is scheduled by the
workflow until it is completed by an activity worker.HeartbeatTimeout
: If your activity is a heartbeating activity, this
configuration basically specifies the maximum duration the Cadence
server would wait for a heartbeat before assuming the activity worker
has failed.How to select a proper timeout value
Picking the StartToCloseTimeout
is fairly straightforward when you know what it does. Essentially, you should make this long enough so that the activity can complete under normal circumstances. Therefore, you should account for everything that can affect the time taken by an activity worker the latency of your down-stream (ie. services, networking etc.). On the other hand, you should aim to keep this value as small as it’s feasible to make your end-to-end system more responsive. If you can’t make this timeout less than a couple of minutes (ideally 1 minute or less), you should consider using a HeartbeatTimeout
config and implement heartbeating in your activity.
ScheduleToCloseTimeout
is also easy to understand, but it is more common to face issues caused by picking a less-than-ideal value here. Therefore, it’s important to ensure that a moment to pay some extra attention to this configuration.
Basically, you should consider everything that can create a backlog in the activity task queue. Some common events that contribute to a backlog are:
Ideally, no activity should timeout while waiting in the task queue, especially if the queue is backed up and the activity is configured to be retried. Because the retries would add more activity tasks to the queue and subsequently make it harder to recover from backlog or make it even worse. On the other hand, there are many use cases where business requirements really limit the total time the system can take to process an activity. Therefore, it’s usually not a bad idea to aim for a high ScheduleToCloseTimeout
value as long as the business requirements allow. Depending on your use case, it might not make sense to keep your activity in the queue for more than a few minutes or it might be perfectly fine to keep it there for several days before timing out.