Search code examples
gremlinjanusgraphjava-17graph-traversalorientdb3.0

Is There A Better Way To Delete Vertexes In JanusGraph?


Is there a way to just drop() all of the Vertexes in JanusGraph like OrientDB?

g.V().drop().iterate() takes about 2 minutes to traverse through 70005 basic vertices. JanusGraph's JanusGraph (interface) extends from JanusGraph's Transaction; not Tinkerpop-Gremlin's Transaction. Instead, its from Tinkerpop-Gremlin's Graph (interface).

2023-05-10 10:21:19,726 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-10 10:21:19,803 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-10 10:21:20,226 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-10 10:21:20,474 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=4bb4c89e)=null; please provide the correct local DC, or check your contact points
2023-05-10 10:21:20,688 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8563c288-rmt-lap-win201
2023-05-10 10:21:20,703 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-10 10:21:20,728 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-10 10:21:20,760 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=1ec16fcd)=null; please provide the correct local DC, or check your contact points
2023-05-10 10:21:20,775 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-10 10:21:20,876 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-10 10:21:20,901 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-10T15:21:20.901765Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
2023-05-10 10:21:20,955 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-10 10:21:22,284 [INFO] [Main.main] ::    g.V().count().next():  70005
2023-05-10 10:21:22,287 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-10 10:23:09,867 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-10 10:23:10,914 [INFO] [Main.main] ::    g.V().count().next():  0
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
        GraphTraversalSource g = janusGraph.traversal();
        logger.info("g.V().count().next():\t" + g.V().count().next());
        g.V().drop().iterate();
        logger.info("g.V().count().next():\t" + g.V().count().next());

In OrientDB and Neo4j, I can just run 2 methods: g.V().drop() + orientGraph.commit() and its done instantly.

        OrientGraph orientGraph = OrientGraph.open(configuration);
        GraphTraversalSource g = orientGraph.traversal();
        g.V().drop();
        orientGraph.commit();

But when I try this in JanusGraph, g.V().drop() + g.tx().commit() does not seem to work.
I've even tried other variations and cannot seem to stumble my way into the right process of steps.
70003 vertices before and 70003 vertices after.

Connected to the target VM, address: '127.0.0.1:55234', transport: 'socket'
2023-05-10 08:05:21,875 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-10 08:05:21,961 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-10 08:05:26,093 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-10 08:05:28,441 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=3954ab39)=null; please provide the correct local DC, or check your contact points
2023-05-10 08:05:28,713 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8563c24184-rmt-lap-win201
2023-05-10 08:05:28,731 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-10 08:05:28,758 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-10 08:05:30,236 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=31297484)=null; please provide the correct local DC, or check your contact points
2023-05-10 08:05:30,249 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-10 08:05:30,350 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-10 08:05:30,374 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-10T13:05:30.374192Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@378f002a
2023-05-10 08:05:30,437 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-10 08:05:31,861 [INFO] [Main.main] ::    g.V().count().next():  70003
2023-05-10 08:05:31,866 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-10 08:05:33,045 [INFO] [Main.main] ::    g.V().count().next():  70003
2023-05-10 08:05:34,523 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-10 08:05:34,530 [INFO] [Main.main] ::    vp[empty]
2023-05-10 08:05:34,531 [INFO] [Main.main] ::    vp[empty]
2023-05-10 08:05:34,534 [INFO] [Main.main] ::    vp[empty]
...
2023-05-10 08:05:34,530 [INFO] [Main.main] ::    vp[empty]
Disconnected from the target VM, address: '127.0.0.1:55234', transport: 'socket'

Process finished with exit code 0
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
        GraphTraversalSource g = janusGraph.traversal();
        logger.info("g.V().count().next():\t" + g.V().count().next());
//        g.V().drop().iterate();
        g.V().drop();
        g.tx().commit();
        logger.info("g.V().count().next():\t" + g.V().count().next());

Solution

  • Thanks @HadoopMarc for working through this issue with me!
    Ended up using a while-loop to handle JVM memory issues.

            while (g.V().count().next()>0)
                g.V().limit(10000).drop().iterate();