This documentation for AWS step-functions says that express synchronous step-functions are guaranteed to execute "at-most-once" as opposed to "exactly once" and "at-least-once". See here.
I'm wondering what is the behavior for when it does not successfully execute? Does it timeout? returns a 500?
I tried to find the answer in the aws docs but can't find it.
With Synchronous Express Workflows, as the name implies, the workflow execution runs synchronously with your HTTP request to the API. Under most circumstances, Step Functions will complete that execution and return a response to you, the caller, that your code recognizes and can confirm the execution completed. That said, problems can occur.
For example, the TCP connection from your application to Step Functions could be interrupted and return a network level exception to your code. Depending on the timing, that request may have made it through to the Step Functions service and initiated an execution that will run to completion. Or it might not have made it through. And your code won't be able to tell the difference, meaning that execution ran either 0 or 1 time (at-most-once).
Or, if Step Functions receives the request and begins to run your workflow execution, but then experiences a server side failure (we try super hard to avoid these, but they can happen), you would receive an HTTP 5xx error and the service will not re-run that execution for you (at-most-once). This differs from Async Express Workflows where once you get a successful response to the StartExecution call, even if the execution is interrupted by a server side failure, Step Functions will recognize that and re-run the execution from the beginning.
I hope this helps. That doc page is trying to explain some subtle aspects of distributed systems as they relate to Step Functions. I will see how we can refine it to make it more clear.