Search code examples
retry-logicdapr

Retry Policy to Not Trigger For POST Requests


Is it possible to configure the resiliency policy in Dapr to not perform a retry when the HTTP request is POST? The Polly library in .NET, for example, allows fine-grained configuration of retry and circuit breaker policies. Is it possible to do the same in Dapr?

apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
  name: resiliency-policy
scopes:
  - shoppingbasket
spec:
  policies:
    retries:
      operation:
        policy: constant
        duration: 5s
        maxRetries: 3

    circuitBreakers:
      serviceCB:
        maxRequests: 1
        timeout: 30s
        trip: consecutiveFailures >= 3

  targets:
    apps:
      discount:
        retry: operation
        circuitBreaker: serviceCB

Solution

  • Dapr is written in Golang so it does not use Polly under the hood.

    It has a slightly different trigger concept:

    • In Polly you can define when should the retry policy/strategy handle the outcome of the given operation.
    • In Dapr a retry could be triggered whenever a given operation returns an abnormal state. Here you can find some examples.

    In one of the github issues they call out the following:

    You're correct that it retries on any error, including 4xx responses. For the initial version of resiliency, we took the approach of more retries is better than no retries. In the future, we may add in additional filters/scenarios that you can trigger a retry off of. I know there were plans to add in more CEL level expressions for Phase 2 of resiliency.

    In the proposal PR you can see some mentioning about conditional retries:

     apis:
        invoke:
          - match: appId == "appB"
            # Nested rules: Prevent duplicative checks in rules.
            # Its likely that controler-gen does not support this
            # but apiextensionsv1.JSON can be used as workaround.
            rules:
              - match:
                  request.method == "GET" &&
                  request.metadata.count > 1000
                timeout: largeResponse
                retry: largeResponse
              - match:
                  request.path == "/someOperation"
                retry: someOperation
    

    Unfortunately Phase 2 did not happen yet as far as I know. There is one related open PR: https://github.com/dapr/dapr/pull/7132.