Search code examples
gremlingremlin-server

How to recover with a retry from gremlin NoHostAvailableException


I am using Gremlin Java driver to connect to a local gremlin server.

Simple code flow

Creating client

Cluster cluster = Cluster.build().addContactPoint(<endp>).port(<port>).enableSsl(false).create()
Client client = cluster.connect();

Submit Script

client.submit("g.V().count()");

If when i submit the first time the Gremlin server is down, on subsequent retries after bringing back gremlin server, connection still fails to create.

Exception First attempt when Gremlin Server is down:

org.apache.tinkerpop.gremlin.driver.exception.NoHostAvailableException: All hosts are considered unavailable due to previous exceptions. Check the error log to find the actual reason

Exception After Gremlin server is brought back up:

tinkerpop.gremlin.driver.exception.NoHostAvailableException: All hosts are considered unavailable due to previous exceptions

One thing to note is i do not create client on retry just do Submit Script

client.submit("g.V().count()");

It is quite possible that Gremlin server may go down anytime, how to recover in such circumstances. Fundamentally is

NoHostAvailableExceptio

recoverable?


Solution

  • Normally, the Client should attempt to reconnect to a host that is previously marked unavailable. By default, it should be retrying the host at 1 second intervals as governed by this configuration: connectionPool.reconnectInterval. In your case, however I think you've uncovered a bug where the reconnect attempts are not started because the Client was never able to reach the host in the first place. As of 3.4.11, you can only remedy this by recreating the Client as you noted in your comments. I've created an issue to track this problem here: TINKERPOP-2569