Search code examples
c#.net-7.0pollyretry-logicgrpc-c#

Identify which GRPC service subscribed is down .NET 7


I have this implementation for subscribing the GRPC stream services, but, I have to identify when one of the services goes off and calls the event to notify the UI.

public async Task Subscribe()
{
     await Policy
          .Handle<RpcException>(e => e.Status.StatusCode == StatusCode.Unavailable)
          .WaitAndRetryAsync(
                    10,
                    attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
                    onRetry: (e, ts) => {
                        logger.Warning("Subscription connection lost. Trying to reconnect in {Seconds}s!", ts.Seconds);

                    })
          .ExecuteAsync(() => {
               IAsyncEnumerable<Notification> stream = await subscribe.Subscribe(currentUser)
               await foreach (Notification? ev in stream)
               {
                    switch (reply.ActionCase)
                    {
                        case Notification.ActionOneofCase.Service1:
                            logger.Warning("Incoming reply 'Service1'");
                            break;

                        case Notification.ActionOneofCase.Service2:
                            //TODO:
                            break;
                    }
            }
     });
}

I tried to use polly, but I don't know how to get when one specific service is down. I need to identify when one of the services is off to notify the UI. What would be the best approach to identity which service goes off?

EDIT:

That's how each service is injected:

private static void AddGrpcService<T>(IServiceCollection services,
                                        Config config) where T : class
{
    SocketsHttpHandler socketsHandler = new SocketsHttpHandler()
    {
        PooledConnectionIdleTimeout = Timeout.InfiniteTimeSpan,
        KeepAlivePingDelay = TimeSpan.FromSeconds(60),
        KeepAlivePingTimeout = TimeSpan.FromSeconds(30),
        EnableMultipleHttp2Connections = true
    };

    MethodConfig defaultMethodConfig = new MethodConfig
    {
        Names = { MethodName.Default },
        RetryPolicy = new RetryPolicy
        {
            MaxAttempts = 5,
            InitialBackoff = TimeSpan.FromSeconds(1),
            MaxBackoff = TimeSpan.FromSeconds(5),
            BackoffMultiplier = 1.5,
            RetryableStatusCodes = { StatusCode.Unavailable }
        }
    };

    ServiceConfig serviceConfig = new() { MethodConfigs = { defaultMethodConfig } };
    services.AddGrpcClient<T>(o => {
        o.Address = new Uri(config.GrpcUrl);
    })
            .ConfigureChannel(o => {
                o.Credentials = GetGrpcClientCredentials(config);
                o.ServiceConfig = serviceConfig;
            })
            .ConfigurePrimaryHttpMessageHandler(() => socketsHandler);
}

Solution

  • Do you have problem to determine that a downstream service is down?

    In case of HttpClient and Polly integration there is a static method, called HandleTransientHttpError. This triggers whenever the status code is 408 or 5xx. This also triggers in case of HttpRequestException.

    Please bear in mind that it will NOT trigger for status code like 429 (Too Many Requests) which could also indicate that the downstream service is overloaded.

    Initially I would have suggested to shoot for somewhat similar status codes. But since I'm not familiar with GRPC so, I'm just best guessing here based on documentation and this envoy issue.

    readonly StatusCode[] RetriableStatusCodes = new[] 
    { 
      StatusCode.Cancelled, 
      StatusCode.DeadlineExceeded, 
      StatusCode.ResourceExhausted
    };
    
    ...
    await Policy
          .Handle<RpcException>(e => RetriableStatusCodes.Contains(e.Status.StatusCode))
          ...
    

    Do you want to know where and how should you fire the notification?

    The onRetry or the onRetryAsync could be the best place to fire a notification.

    These user delegates are called if the policy will trigger and before the sleep.

    In other words if the initial attempt fails and the Handle predicate is evaluated to true then it will call the onRetry(Async) delegate. After the delegate completed it will go to sleep before the first retry attempt.

    The onRetry(Async) won't be called

    • if the Handle predicate is evaluated to false
    • if the max retry count is exceeded regardless the outcome of the Handle