Search code examples
gremlincassandra-3.0janusgraphcqlshjava-17

Where Are JanusGraph's Vertexes In Cassandra?


Where does Apache Cassandra put its JanusGraph Vertexes?

When I look for them I can't find them. I've tried using cqlsh but I'm new to CQL (Cassandra Query Language) and Cassandra in-general. So please forgive my lack of knowledge and unskillful writing. I'm still working on learning the Gremlin query language for JanusGraph's Gremlin.

Reproduction

Steps

  1. Create-and-Startup a Cassandra [Docker container]
    docker run --name jg-cassandra -d -e CASSANDRA_START_RPC=true -p 9160:9160 -p 9042:9042 -p 7199:7199 -p 7001:7001 -p 7000:7000 cassandra:3.11
    
  2. Create-and-Run this Java + Maven Project (code below)
  3. Results
    1. Expected: Something like OrientDB or Neo4j to have a table for vertexes to read from
    2. Actually: No such name like V or vertex found

Logs

Cassandra [Docker container] Terminal

cqlsh> desc keyspaces;

system_schema  system      system_distributed
system_auth    janusgraph  system_traces     

cqlsh> use janusgraph;

cqlsh:janusgraph> desc tables

edgestore_lock_  graphindex_lock_         janusgraph_ids   
txlog            systemlog                graphindex       
edgestore        system_properties_lock_  system_properties

Log4j2 STOUT Logs

2023-05-09 11:01:53,970 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 11:01:54,072 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 11:01:54,652 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 11:01:54,967 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=7b9f753a)=null; please provide the correct local DC, or check your contact points
2023-05-09 11:01:55,209 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8563c1700-rmt-lap-win201
2023-05-09 11:01:55,231 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 11:01:55,265 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 11:01:55,322 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=1dc6cab9)=null; please provide the correct local DC, or check your contact points
2023-05-09 11:01:55,341 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-09 11:01:55,447 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-09 11:01:55,473 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-09T16:01:55.473310Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@ff23ae7

Process finished with exit code 0

Code

Main.java simplified

public class Main {
    public static void main(String[] args) {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
        JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
        PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
        janusGraphManagement.commit();
        JanusGraphVertex janusGraphVertex = janusGraph.addVertex();
        janusGraphVertex.property("test","test");
        janusGraph.tx().commit();
        janusGraphVertex = janusGraph.addVertex();
        janusGraphVertex.property("test","test2");
        janusGraph.tx().commit();
        janusGraph.close();
    }
}

pom.xml snippet

    <dependencies>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-slf4j2-impl</artifactId>
            <version>2.20.0</version>
        </dependency>
        <dependency>
            <groupId>org.janusgraph</groupId>
            <artifactId>janusgraph-cql</artifactId>
            <version>1.0.0-20230504-014643.988c094</version>
        </dependency>
    </dependencies>

log4j2.xml focused

<Configuration>
    <Appenders>
        <Console name="STDOUT" target="SYSTEM_OUT">
            <PatternLayout>
                <Pattern>%d [%p] [%c{1.}.%t] ::&#x09; %m%n</Pattern>
            </PatternLayout>
        </Console>
    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="STDOUT"/>
        </Root>
    </Loggers>
</Configuration>

Resources

  1. Maven 3.8.1
  2. Java 11.0.19 (corretto-11)
  3. JanusGraph 1.0.0-20230504-014643.988c094
  4. Windows 10
  5. Docker
  6. Cassandra:latest [container]
  7. ICIJ Offshore Dataleaks

Solution

  • As you noticed, JanusGraph creates several tables when it starts up. All the primary graph data is stored as wide rows in the edgestore table. However, these tables are largely opaque blobs and you will not be able to query them meaningfully from CQL.

    The way that the edgestore table is constructed is discussed here