Search code examples
full-text-searchgremlincassandra-3.0janusgraphtinkerpop3

Thought I Created A JanusGraph Vertex Index But I Guess Not; What Went Wrong?


Why is JanusGraph not finding-and-using the index I made at time of search?

Please note, I need this to work with the Java implementation specifically; not Gremlin Console.
And this needs to work with the latest JanusGraph releases; not older versions,
Especially not with different major-version release-numbers.

[WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();

So I expect for class Vertex (which is all of vertexes) to be using the index.
But what I actually see is the warning that this is not so; and
I experience is data returned and changed to be slower than expected.

2023-05-09 15:18:38,678 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:38,759 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 15:18:39,215 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 15:18:39,468 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=60cfcdc3)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,691 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8563c21376-rmt-lap-win201
2023-05-09 15:18:39,707 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:39,732 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 15:18:39,766 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=3d0c7306)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,786 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-09 15:18:39,897 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-09 15:18:39,919 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-09T20:18:39.919098Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
2023-05-09 15:18:39,971 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-09 15:18:41,311 [INFO] [Main.main] ::    g.V().count().next():  69999
2023-05-09 15:18:41,314 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes

The index seems to exist, because trying to create will cause IllegalArgumentException: An index with name '_id' has already been defined.
I don't delete the index in-between runs, so there should be more than enough time for JanusGraph to create the index, especially if I let the container run overnight.

...

2023-05-09 15:21:38,946 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-09 15:21:39,053 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-09 15:21:39,079 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-09T20:21:39.079040Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalArgumentException: An index with name '_id' has already been defined
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
    at org.janusgraph.graphdb.database.management.ManagementSystem.checkIndexName(ManagementSystem.java:661)
    at org.janusgraph.graphdb.database.management.ManagementSystem.createCompositeIndex(ManagementSystem.java:728)
    at org.janusgraph.graphdb.database.management.ManagementSystem.access$300(ManagementSystem.java:130)
    at org.janusgraph.graphdb.database.management.ManagementSystem$IndexBuilder.buildCompositeIndex(ManagementSystem.java:824)
    at Test3.main(Test3.java:24)

I've done this before with OrientDB and thought I understood how to work with Tinkerpop3 + Gremlin-related graph-databases then. But as I'm transitioning to JanusGraph from OrientDB they don't seem to have the same process of using indexes after they've been made.

        OrientGraph orientGraph = OrientGraph.open(configuration);
        if (!orientGraph.getVertexIndexedKeys("V").contains("_id"))
            orientGraph.createVertexIndex("_id", "V", configuration);
        orientGraph.commit();

Update

Update 1

JanusGraph never seems to make the index requested; same thing as before but now:

...

2023-05-11 07:49:16,383 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-11T12:49:16.382971Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@6ad16c5d
2023-05-11 07:49:16,512 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED

...

2023-05-11 07:50:16,913 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Timed out (PT1M) while waiting for index _id to converge on status(es) [REGISTERED]
2023-05-11 07:50:16,998 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,020 [INFO] [Main.main] ::    g.V().count().next():  0
2023-05-11 07:50:17,025 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,032 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,038 [INFO] [Main.main] ::    g.V().count().next():  0

Process finished with exit code 0

Awaiting now for JanusGraph to create index by casting JanusGraph.openManagement() from JanusGraphManagement to ManagementSystem.

I've already tried variations of janusGraphManagement.rollback() even though this is a brand-new container with no data. Here's the log output to show what messages I get:

...

2023-05-11 08:16:16,302 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-11T13:16:16.302033Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalStateException: Operation cannot be executed because the enclosing transaction is closed
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.verifyOpen(StandardJanusGraphTx.java:369)
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getRelationType(StandardJanusGraphTx.java:1080)
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getOrCreatePropertyKey(StandardJanusGraphTx.java:1142)
    at org.janusgraph.graphdb.database.management.ManagementSystem.getOrCreatePropertyKey(ManagementSystem.java:1478)
    at Test7.main(Test7.java:20)
...

2023-05-11 08:18:50,029 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-11T13:18:50.029694Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalStateException: This management system instance has been closed
    at com.google.common.base.Preconditions.checkState(Preconditions.java:502)
    at org.janusgraph.graphdb.database.management.ManagementSystem.ensureOpen(ManagementSystem.java:228)
    at org.janusgraph.graphdb.database.management.ManagementSystem.commit(ManagementSystem.java:233)
    at Test7.main(Test7.java:17)

Code

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.PropertyKey;
import org.janusgraph.graphdb.database.management.ManagementSystem;

public class Test7 {
    private static final Logger logger = LogManager.getLogger(Main.class);

    public static void main(String[] args) throws InterruptedException {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:19042").open();
        ManagementSystem janusGraphManagement = (ManagementSystem) janusGraph.openManagement();
        janusGraphManagement.rollback();
        janusGraphManagement = (ManagementSystem) janusGraph.openManagement();
        PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
        if (!janusGraphManagement.containsGraphIndex("_id"))
            janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
        janusGraphManagement.commit();
        ManagementSystem.awaitGraphIndexStatus(janusGraph, "_id").call();
//        janusGraphManagement.updateIndex(janusGraphManagement.getGraphIndex("_id"), SchemaAction.REINDEX);
        GraphTraversalSource g = janusGraph.traversal();
        logger.info("g.V().count().next():\t" + g.V().count().next());
        g.V().drop().iterate();
        logger.info("g.V().count().next():\t" + g.V().count().next());
        janusGraph.close();
    }
}

Solution

  • Short answer: JanusGraph wants it SchemaStatus.ENABLED.
    Which I can get ENABLED, but the issue still persists for Cassandra.

    [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
    

    Logs

    ...
    
    2023-05-15 09:58:05,984 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-15T14:58:05.984686Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@48b4a043
    2023-05-15 09:58:06,047 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
    2023-05-15 09:58:06,270 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
    2023-05-15 09:58:06,275 [INFO] [Main.main] ::    drop g.V().hasLabel("entity").count().next():  0
    2023-05-15 09:58:07,020 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
    2023-05-15 09:58:07,033 [INFO] [Main.main] ::    addVertex g.V().hasLabel("entity").count().next(): 1
    2023-05-15 09:58:07,046 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
    
    ...
    
    [o.j.g.d.m.GraphIndexStatusWatcher.main] ::  Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
    2023-05-15 09:59:07,520 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Timed out (PT1M) while waiting for index _id to converge on status(es) [REGISTERED]
    2023-05-15 09:59:07,522 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
    2023-05-15 09:59:07,531 [INFO] [Main.main] ::    awaitGraphIndexStatus g.V().hasLabel("entity").count().next(): 1
    2023-05-15 09:59:07,627 [INFO] [o.j.g.o.j.IndexRepairJob.Thread-68] ::   Index _id metrics: success-tx: 2 doc-updates: 0 succeeded: 1
    
    ...
    
    2023-05-15 09:59:08,297 [INFO] [o.j.g.d.m.ManagementSystem.Thread-51] ::     Index update job successful for [_id]
    2023-05-15 09:59:08,299 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
    2023-05-15 09:59:08,304 [INFO] [Main.main] ::    updateIndex g.V().hasLabel("entity").count().next():   1
    
    Process finished with exit code 0
    

    Code

    public class Test {
        private static final Logger logger = LogManager.getLogger(Main.class);
        public static void main(String[] args) throws InterruptedException, ExecutionException {
            JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
            GraphTraversalSource g = janusGraph.traversal();
            g.V().drop().iterate();
            janusGraph.tx().commit();
            janusGraph.tx().open();
            logger.info("drop g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
            JanusGraphVertex vertex = janusGraph.addVertex("entity");
            vertex.property("_id", "Test1");
            janusGraph.tx().commit();
            janusGraph.tx().open();
            logger.info("addVertex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
            JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
            PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
            if (!janusGraphManagement.containsGraphIndex("_id"))
                janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
            janusGraphManagement.commit();
            janusGraphManagement = janusGraph.openManagement();
            GraphIndexStatusReport report = ManagementSystem.awaitGraphIndexStatus(janusGraph, "_id").call();
            logger.info("awaitGraphIndexStatus g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
            JanusGraphIndex test = janusGraphManagement.getGraphIndex("_id");
            janusGraphManagement.updateIndex(test, SchemaAction.REINDEX).get();
            janusGraphManagement.commit();
            logger.info("updateIndex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
            janusGraph.close();
        }
    }