I have been facing this issue intermittently, once in 2 weeks. As I observed, each time this issue occurs, all are due to activity Id 26. I see activity id 26 got completed successfully. Not sure why decision task is failing with the activity which got completed successfully.
"eventTimestamp": 1500668941.164,
"eventType": "WorkflowExecutionFailed
",
"eventId": 866, "workflowExecutionFailedEventAttributes":
"details":
"java.lang.IllegalArgumentException:Unknown DecisionId
[type=ACTIVITY, id=26].
The possible causes arenon-deterministic workflow definition
code orincompatible change
in the workflow definition.at: com.amazonaws.services.simpleworkflow.flow.worker.DecisionsHelper.getDecision(DecisionsHelper.java:613)
com.amazonaws.services.simpleworkflow.flow.worker.DecisionsHelper.handleActivityTaskScheduled(DecisionsHelper.java:171)
at: com.amazonaws.services.simpleworkflow.flow.worker.AsyncDecider.processEvent(AsyncDecider.java:267) com.amazonaws.services.simpleworkflow.flow.worker.AsyncDecider.decide(AsyncDecider.java:515)
at: com.amazonaws.services.simpleworkflow.flow.worker.AsyncDecisionTaskHandler.handleDecisionTask(AsyncDecisionTaskHandler.java:50)
at com.amazonaws.services.simpleworkflow.flow.worker.DecisionTaskPoller.pollAndProcessSingleTask(DecisionTaskPoller.java:201)
at com.amazonaws.services.simpleworkflow.flow.worker.GenericWorker$PollServiceTask.run(GenericWorker.java:94)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)","decisionTaskCompletedEventId": 865,
"reason": "Unknown DecisionId
[type=ACTIVITY, id=26].
The possible causes arenon-deterministic workflow definition
code or incompatible change in the workflow definition."
Any help would be appreciated.
AWS Flow Framework uses replay to reconstruct the workflow state. It requires that workflow code is deterministic. I.e. it gets exactly into the same state when a history is replayed. There are multiple ways to break determinism. The most common is workflow code (or configuration) change while some workflows are running or using system time instead of Clock object provided by the framework. Do you do any time calculations or have you changed your workflow code recently?
See Making Changes to Decider Code: Versioning and Feature Flags for various ways of updating workflow code without breaking running workflows.