I have a XML-File with like 1 Mio Vertices and Edges. I want to load them into a Gremlin Server but after like 200.000 Nodes I get a OUTOfMemoryException in the Gremlin Server.
I tried 2 different ways
GraphTraversal<Vertex, Vertex> verttmp = g.addV(mapprop.get(LABEL)).property(T.id, keyid);
verttmp.next();
and with a Client
client.submit(query);
Is there a safe way to work with so many nodes ?
I assume that you get this OutOfMemoryException
in Gremlin Server. By default, Gremlin Server is configured with an -Xmx4096m
which may be insufficient for the size of the graph you are loading (especially if you are using TinkerGraph which is a pure in-memory graph). You just need to increase the size of -Xmx
in your gremlin-server.sh file until you have enough memory to hold your graph. Perhaps start by doubling to 8192m
but given that you only got through 20% of your load I'd wonder if doubling is enough.
As a side note, if you're throwing away the value of verttmp.next()
(i.e. in your example code, it appears like you are) then it would be better to do:
g.addV(mapprop.get(LABEL)).property(T.id, keyid).iterate()
That will be significantly cheaper as you don't waste time returning any results which would have to be serialized and sent across the wire. Also, if you are submitting scripts (i.e. where "query" is a String
) then I can see that you are not parameterizing your requests. That's a performance killer and I can imagine memory requirements driving much higher than needed as a result thus raising the OutOfMemoryException
much earlier than it should. Either modify your code to use parameters or simple make bytecode based requests with remote traversals:
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"));
I'd suggest you take the bytecode approach which saves you from having to parameterize and let's you write your Gremlin as code rather than embedded strings.