GRPC Java caches old DNS entry and does not retry to get a new entry

Similar to Java Grpc: invalidate dns cache but more specific to a Docker development environment where services can come up and down (since some stacks are pretty resource intensive so some are shutdown and brought up on an as needed basis). Its easier to do docker stack rm and docker stack deploy for the parts that we need rather than creating scale up and down scripts for each stack. In that situation when the services are recreated the VIP can change (I know I can just fix the IP on the network, but that would mean each stack has to know which ones are available which is something I'd want to avoid).

Given that context, the DNS may point to an older address, so I use the -Dnetworkaddress.cache.ttl=5 looking at the grpc-java code it appears that should be all I need to set so that it is consistent. At least for 5 seconds (which is the TTL)

But let's say I want to make it performant and only retry when needed and only define the message channel once (since according to the instructions it is a very expensive operation. I set things up as follows. But I am getting occassional failures. I don't reuse the stubs because it appears to cause problems in my first go.

@Component
public class ClientProvider {
  @Value("${srv.host}")
  private String host;

  @Value("${srv.port}")
  private int port;

  @Value("${srv.maxRetryAttempts:5}")
  private int maxiRetryAttempts;

  @Value("${srv.maxMessageSize:50000000}")
  private int maxMessageSize;

  private ManagedChannel channel;
  @PostConstruct
  public void initChannel() {
    channel = ManagedChannelBuilder
             .forAddress(host, port)
             .usePlaintext()
             .enableRetry()
             .maxRetryAttempts(maxRetryAttempts)
             .maxInboundMessageSize(maxMessageSize)
             .build();
  }

  public MyBlockingStub blockingStub() {
    return MyGrpc.newBlockingStub(channel);
  }

  public MyStub stub() {
    return MyGrpc.newStub(channel);
  }
  
  @PreDestroy
  public void shutdownChannel() {
    channel.shutdown();
  }
}

So I got rid of the single channel and created channels everytime I needed a stub. It seems to work but I don't think this is the right way of doing things especiaally since I am not cleaning up the channels after use.

@Component
public class ClientProvider {
  @Value("${srv.host}")
  private String host;

  @Value("${srv.port}")
  private int port;

  @Value("${srv.maxRetryAttempts:5}")
  private int maxiRetryAttempts;

  @Value("${srv.maxMessageSize:50000000}")
  private int maxMessageSize;

  public ManagedChannel channel() {
    return ManagedChannelBuilder
             .forAddress(host, port)
             .usePlaintext()
             .enableRetry()
             .maxRetryAttempts(maxRetryAttempts)
             .maxInboundMessageSize(maxMessageSize)
             .build();
  }

  public MyBlockingStub blockingStub() {
    return MyGrpc.newBlockingStub(channel());
  }

  public MyStub stub() {
    return MyGrpc.newStub(channel());
  }
}

I am thinking it is just a matter of setting up the configuration correctly on the channel, but I am not sure if I am missing anything.

Solution

The existing logic only retries the dns lookup when refresh() is called after the timeout period. Refresh will only be called as needed, such as the channel moving to state TRANSIENT_FALIURE or the reconnect attempts after that, so your original plan to set a relatively short cache.ttl was a good approach.