go aws-lambda aws-serverless aws-step-functions

AWS Step Function error handling for Go Lambda

I cannot find a detailed explanation of how to define the error condition matcher in the Step function, based on the error returned by the Go handler.

The handler is a bog-standard Go function, returns an error if it gets a 503 from an upstream service:

func HandleHotelBookingRequest(ctx context.Context, booking HotelBookingRequest) (
    confirmation HotelBookingResponse, err error) {
    
    ...
        if statusCode == http.StatusServiceUnavailable {
            err = errors.New("TransientError")
        } else {

I can control what the function returns, and how it formats the string; I cannot find any real information about what to use here (or in a Catch clause, for that matter), so tht this matches the above:

      "Retry": [
        {
          "ErrorEquals": [
            "TransientError"
          ],
          "BackoffRate": 1,
          "IntervalSeconds": 1,
          "MaxAttempts": 3,
          "Comment": "Retry for Transient Errors (503)"
        }
      ]

When I test the Lambda in the Console, this is what I get (as expected) when the upstream returns a 503:

{
  "errorMessage": "TransientError",
  "errorType": "errorString"
}

And I have the distinct impression (but not quite sure how to validate this) that if I change to:

          "ErrorEquals": [
            "errorString"
          ],

the Retry works (at least, looking at the CloudWatch logs, I can see the transient errors being logged, but the Step function eventually succeeds).

I cannot find much documentation on this but:

would it be possible to match on the actual error message (I saw that the API Gateway allows to do that, using a RegEx);
if that's not possible, should I return a different "error type", instead of error

Thanks in advance!

Solution

Finally solved the riddle; in the end, it was trivial and fairly identical to the JavaScript approach (which (a) gave me the hint and (b) is widely documented in examples); however, as I was unable to find a Go-specific answer anywhere (in AWS -expansive, good, detailed- documentation, Google, here) I am posting it here for future reference.

TL;DR - define your own implementation of the error interface and return an object of that type, instead of the bog-standard fmt.Error(), then use the type name in the ErrorEquals clause.

A very basic example implementation is shown in this gist.

To test this, I have created an ErrorStateMachine (JSON definition in the same gist) and selected a different catcher based on the ErrorEquals type:

        {
          "ErrorEquals": [
            "HandlerError"
          ],
          "Next": "Handler Error"
        }

Testing the Step Function with different Outcome inputs, causes different paths to be chosen.

What I guess tripped me off was that I am a relative beginner when it comes to Go and I hadn't realized that errorString is the actual type of the error interface returned by the errors.New() method, which is used inside fmt.Errorf():

// in errors/errors.go

// errorString is a trivial implementation of error.
type errorString struct {
    s string
}

I had naively assumed that this was just something that AWS named.

An interesting twist (which is not really ideal) is that the actual error message is "wrapped" in the Step function output and may be a bit cumbersome to parse in subsequent steps:

{
  "Error": "HandlerError",
  "Cause": "{\"errorMessage\":\"error from a failed handler\",\"errorType\":\"HandlerError\"}"
}

It would have certainly been a lot more developer-friendly to have the actual error message (generated by Error()) to be emitted straight into the Cause field.

Hope others find this useful and won't have to waste time on this like I did.