Search code examples
titangremlintinkerpopgremlin-server

TITAN- Parallel queries -Concurrent time out exception at org.apache.tinkerpop.gremlin.driver.Client.submit


As a part of the volume and performance test, I am trying to execute multiple gremlin requests (graph traversal) in parallel using java threads. it works fine smaller number of threads .

When i increase the number of threads (say 500), I am getting the following error

Exception in thread "Thread-34" java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out waiting for an available host. at org.apache.tinkerpop.gremlin.driver.Client.submit(Client.java:146) at com.tests.java.titan.Vertices.exists(Vertices.java:37) at com.tests.java.titan.Complex.searchNodesRelatedByRelation(Complex.java:110) at com.tests.java.perfTests.TitanThread.run(ParallelGraphTraversal.java:112) Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out waiting for an available host. at org.apache.tinkerpop.gremlin.driver.Client.submitAsync(Client.java:194) at org.apache.tinkerpop.gremlin.driver.Client.submitAsync(Client.java:174) at org.apache.tinkerpop.gremlin.driver.Client.submit(Client.java:144) ... 3 more

I tried increasing the values of threadPoolWorker from 1 to 2 and gremlinPool from 8 to 16 (in the file gremlin-server.yaml). But i did not notice any difference.

Did any one face this issue? Could you please tell me if there is a limitation on the max number of simultaneous connections possible ?

Our environment: CDH 5.7.1, Titan 1.1.0-SNAPSHOT, Solr 4.10.3, HBase 1.2.0, titan-tp3-driver to create remote connection to gremlin server and for querying


Solution

  • The gremlinPool setting on the server tends to be limited to Runtime.availableProcessors() so it usually doesn't make sense to make the number bigger than that. The number of requests that the server will support is somewhat determined by the types of traversals your executing. I could imagine situations where you send a series of longer run requests that could potentially tie up a number of gremlinPool threads to the point where it slows down the script processing capabilities of the server itself. Gremlin Server will likely continue to accept requests, storing them in a queue for processing as they arrive but they will just take longer to process.

    This situation in and of itself should not force this error, but the default settings of the driver may be inadequate for what you are trying to do. The driver has a load of settings which control the flow of messages to the server. If the state of the driver, for a particular host, fall outside the boundaries of those settings, it will ignore that host and look for another. For example, if the connectionPool.maxInProcessPerConnection is exceeded and no additional connections can be added because that too is maxed, then that host will be ignored during the driver's process for selecting the next host to send a message to. In this way, a particular host won't be overloaded with requests by the client.

    In your situation, I assume there are no other hosts in your configuration, and so as there is no where else to send those requests, the driver tries to wait for a connection to free itself. If we consider our example, then it waits for the number of in process requests to fall below connectionPool.maxInProcessPerConnection. How long will the driver wait for that to happen? It will wait as long as connectionPool.maxWaitForConnection. If that time is exceeded then you will get that error message that you are seeing.