Search code examples
cadence-workflow

DecisionTaskTimedOut before the specified timeout


I have a case when decision times out after 5 seconds when timeout is set to 10:

  17  2019-06-13T17:46:59Z  DecisionTaskScheduled     {TaskList:{Name:maxim-C02XD0AAJGH6:db09fd84-98bf-4546-a0d8-fb51e30c2b41},
                                                      StartToCloseTimeoutSeconds:10, Attempt:0}
  18  2019-06-13T17:47:04Z  DecisionTaskTimedOut      {ScheduledEventId:17,
                                                      StartedEventId:0,
                                                      TimeoutType:SCHEDULE_TO_START}
10:49 AM

It is using Cadence service running in a local docker and I can reproduce it reliably.


Solution

  • The 5s timeout is due to Cadence Sticky Execution feature. Sticky Execution is enabled by default on Cadence Worker which allows the workflow state to be cached on the worker after responding back with decisions. This allows Cadence server to directly dispatch new decision tasks to the same worker which allows to reuse the cached state and produce new decisions without replaying the entire execution history.

    Decision SCHEDULE_TO_START timeout is put in place to allow decision to be sent to another worker when worker restarts and there is no poller on the sticky tasklist for a workflow execution. This causes the stickyness to be cleared by Cadence server for that execution and decision dispatched to original tasklist so it can be picked up by any other worker.

    // Optional: Sticky schedule to start timeout.
    // default: 5s
    // The resolution is seconds. See details about StickyExecution on the comments for DisableStickyExecution.
    StickyScheduleToStartTimeout time.Duration