Improving Gremlin Server Command Evaluation Performance

When inserting tens of thousands of nodes and edges into a cassandra-backed tinkerpop I see basically all services are mostly idle except the Gremlin Server. That is, the client connected to the websocket and sending the Gremlin formatted commands is not consuming much CPU time and neither is Cassandra or ElasticSearch. Gremlin Server, on the other hand, is consuming several CPUs (on a rather beefy machine with dozens of cores and hundreds of gigabytes of RAM).

Increasing the number of GS worker threads doesn't have a positive impact. Increasing the number of simultaneous websocket requests permitted (a client setting) also does not help. Oddly, an unbounded number of concurrent websocket requests results in data failing to be inserted without any HTTP error message responses.

The working theory is that gremlin server's bottleneck is evaluation of the Gremlin commands (g.addV, etc). Does anyone have experience getting high ingest rates using the websocket plugin or is it necessary for me to write my own JVM langauge plugin that works on binary data to avoid parsing and evaluation of strings?

EDIT: The scripts are batches of up to 100 statements of either vertex insertions or edge/vertex/edge insertions:

The vertex insertions:

graph.addVertex(label, tyParam, 'ident', vertexName, param1, val1, param2, val2 ...) ;
graph.addVertex(...) ; 
...

For triples of edge, vertex, edge:

edgeNode = graph.addVertex(...) ; 
g.V().has('ident',var).next().addEdge(var2,edgeNode) ;
edgeNode.addEdge(var3, g.V().has('ident',var4).next())

'ident' is node indexed so that .has should be fast. Sadly, the dataset includes edges for sources or destinations that do not exist, causing "FastNoSuchElementException" errors. In error cases we split the set of statements in half and retry the script as two smaller insertion attempts. For example, a script of 50 edge/vertex/edge insertion statements failing becomes two scripts of 25 and this process continues all the way down to a script with a single e/v/e insertion where any failure is ignored.

N.B. I'm using Titan 1.0.

Solution

It is important to do script parameterization for the best performance in Gremlin Server. Without that, the server cannot effectively utilize its script cache, and every script that is processed ends up having to be compiled which is typically one of the more expensive parts of a request. Note that for the script cache to work, scripts must be identical in every way in their text to even include variable names. In other words:

g.V(var1)
g.V(var2)

are not identical because they have different variable names and therefore will not take advantage of the script cache.

In the event that it is not possible to make scripts identical, then it would be smart to submit such scripts with the #jsr223.groovy.engine.keep.globals request parameter set to something other than hard (i.e. soft, weak or phantom) so that Gremlin Server can reclaim memory from the script cache as more new scripts arrive.