Search code examples
graph-databasesgremlintinkerpopjanusgraph

Why is JanusGraph's addVertex() much slower than addV() with a graph traversal?


I'm using JanusGraph to add vertices to a cassandra backed database, and I noticed a large performance discrepancy when it comes to adding a vertex with (1) the addVertex() method provided by the JanusGraph java libraries vs (2) the addV() gremlin traversal function. Why is there such a discrepancy?

I am using JanusGraph version 0.2.0 with cql as the storage backend. I created a test that compares the time in milliseconds it takes to add and commit a vertex to the graph with three methods: (1) addV() gremlin function, (2) addV() gremlin function followed by an next() step to get the newly created vertex, and (3) the JanusGraph addVertex() method. I am starting from a completely empty graph storage. The code I used can be found below.

final Builder builder = JanusGraphFactory.build()
        .set("storage.backend", "cql")
        .set("storage.hostname", Config.get(CommonConfig.cassandra_host));

final JanusGraph graph = builder.open();

long nowMillis = TimeUtils.nowMillis();
graph.traversal().addV("myLabel");
graph.traversal().tx().commit();
System.out.println("(1) - Add vertex traversal only took " + (TimeUtils.nowMillis() - nowMillis) + " millis");

nowMillis = TimeUtils.nowMillis();
graph.traversal().addV("myLabel").next();
graph.traversal().tx().commit();
System.out.println("(2) - Add vertex traversal and next took " + (TimeUtils.nowMillis() - nowMillis) + " millis");

nowMillis = TimeUtils.nowMillis();
graph.addVertex("myLabel");
graph.traversal().tx().commit();
System.out.println("(3) - Add vertex method took " + (TimeUtils.nowMillis() - nowMillis) + " millis");

This is a sample output of running this:

(1) - Add vertex traversal only took 15 millis
(2) - Add vertex traversal and next took 739 millis
(3) - Add vertex method took 682 millis

This hints to me that (3) adding with JanusGraph addVertex does something similar to (2), but I don't understand why the time differences are so large. What causes (2) and (3) to take order of magnitude longer to run than (1)?


Solution

  • Your first bit of Gremlin that you are testing doesn't actually create a vertex. You are just measuring the creation of a Traversal object but not actually iterating it. The other two actually create a Vertex object in the graph. The general recommendation is to not use Graph.addVertex() as that is not a user focused API - it is meant for graph providers like JanusGraph. Only use the Gremlin language for interacting with you graph and that will give you the widest level of code portability.