I want to implement Retry logic in my application when using Google Cloud AI Platform's PredictionServiceClient from nuget - https://www.nuget.org/packages/Google.Cloud.AIPlatform.V1 . Retry scenarios include the case when the quota limit for call exceeds (status 429 too many requests) or the Service is Unavailable. I want to use the same retry and timeout properties for all calls using the PredictionServiceClient. Earlier In same application, I have used azure openai nuget(https://www.nuget.org/packages/Azure.AI.OpenAI/) where I was able to provide the retry properties in the client creation phase like below -
OpenAIClientOptions opts = new OpenAIClientOptions();
opts.Retry.MaxRetries = 2;
opts.Retry.Delay = TimeSpan.FromSeconds(60);
opts.Retry.NetworkTimeout = TimeSpan.FromSeconds(100);
opts.Retry.Mode = (RetryMode)connDetails.AzureOpenAIRetryMode.Value;
client = new OpenAIClient(new Uri("sampleUrl"),new AzureKeyCredential("sampleKey"), opts);
I am using the Google Cloud AI Platform's PredictionServiceClient to stream content generation like below -
Google.Cloud.AIPlatform.V1.PredictionServiceClient.StreamGenerateContentStream response = myPredictionServiceClient.StreamGenerateContent(generateContentRequest);
What I have tried - Below is the code I wrote for retry -
PredictionServiceSettings settings = new PredictionServiceSettings
{
CallSettings = CallSettings.FromRetry(RetrySettings.FromExponentialBackoff(
maxAttempts: 2,
initialBackoff: TimeSpan.FromSeconds(1),
maxBackoff: TimeSpan.FromSeconds(10),
backoffMultiplier: 2,
retryFilter: RetrySettings.FilterForStatusCodes(StatusCode.Unavailable)
)).WithTimeout(TimeSpan.FromSeconds(100))
};
bardClient = await new PredictionServiceClientBuilder
{
Settings = settings,
Endpoint = connDetails.AzureOpenAIResourceUrl
}.BuildAsync();
An error is thrown at the StreamGenerateContent
method call. The Google API does not permit retries for this type of operation. Below is the error message -
HResult=0x80131509
Message=Retry not permitted for this operation type
Source=Google.Api.Gax.Grpc
StackTrace:
at Google.Api.Gax.Grpc.CallSettingsExtensions.ValidateNoRetry(CallSettings callSettings)
at Google.Api.Gax.Grpc.ApiServerStreamingCall.<>c__DisplayClass0_0`2.<Create>b__1(TRequest req, CallSettings cs)
at Google.Cloud.AIPlatform.V1.PredictionServiceClientImpl.StreamGenerateContent(GenerateContentRequest request, CallSettings callSettings) in Google.Cloud.AIPlatform.V1\PredictionServiceClientImpl.cs:line 250
It originates from this method in the google API code - https://github.dev/googleapis/gax-dotnet/blob/ee799ad91309ef3102dee60f1baa67d7ec772548/Google.Api.Gax.Grpc/CallSettingsExtensions.cs#L192
My question is how can I implement retry logic in service calls which are using streaming in PredictionServiceClient. Any hint to direct me to correct resource or sample code would be helpful.
For non-streaming responses retry settings work as expected. For streaming scenario, refer this.