Search code examples
gremlingraph-databasesolapspark-cassandra-connectorjanusgraph

Janusgraph OLAP traversal - Connection with cassandra using trusstore config not working


We have a working setup of Janusgraph 0.5.2 version where we are able to insert and query (OLTP) the data as per need. We are exploring JanusGraph OLAP traversal for some reporting and analytical requirements. However when I try to follow the instructions provided on the JanusGraph documentation, we are not able connect to Cassandra when we try to traverse the graph. Cassandra is setup on SSL connection with a Truststore on client side. This configuration connection with Cassandra from Gremlin Console works fine with OLTP traversals.

Below is config for OLTP which works janusgraph-cql-oltp.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=cassandra.cassandra.svc.cluster.local
storage.username=cassandra
storage.password=cassandra123
storage.cql.keyspace=janusgraph
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
storage.lock.wait-time = 60000
storage.cql.ssl.enabled=true
storage.cql.ssl.truststore.location=/etc/config/tls/gremlin/client/truststore
storage.cql.ssl.truststore.password=secretpasswd

When I load this line in gremlin console to connect and traverse a simple query I am able to.

Below is the config for OLAP which is showing error for connection to Cassandra:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
# # JanusGraph Cassandra InputFormat configuration
# # These properties defines the connection properties which were used while write data to JanusGraph.
janusgraphmr.ioformat.conf.storage.backend=cql
# This specifies the hostname & port for Cassandra data store.
janusgraphmr.ioformat.conf.storage.hostname=cassandra.cassandra.svc.cluster.local
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.username=cassandra
janusgraphmr.ioformat.conf.storage.password=cassandra123
janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph
janusgraphmr.ioformat.conf.storage.lock.wait-time = 60000
janusgraphmr.ioformat.conf.storage.cql.ssl.enabled=true
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.location=/etc/config/tls/gremlin/client/truststore
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.password=cassandra123

janusgraphmr.ioformat.conf.storage.ssl.enabled=true
janusgraphmr.ioformat.conf.storage.ssl.truststore.location=/etc/config/tls/gremlin/client/truststore
janusgraphmr.ioformat.conf.storage.ssl.truststore.password=cassandra123

janusgraphmr.ioformat.conf.storage.cql.read-consistency-level=ONE

storage.lock.wait-time = 60000
storage.cql.ssl.enabled=true
storage.cql.ssl.client-authentication-enabled=true
storage.cql.ssl.truststore.location=/etc/config/tls/gremlin/client/truststore
storage.cql.ssl.truststore.password=cassandra123

janusgraphmr.ioformat.conf.cache.db-cache = true
janusgraphmr.ioformat.conf.cache.db-cache-clean-wait = 20
janusgraphmr.ioformat.conf.cache.db-cache-time = 180000
janusgraphmr.ioformat.conf.cache.db-cache-size = 0.5

cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true

# # SparkGraphComputer Configuration #
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

When i load the graph object in the gremlin console, i can see the properties are loaded correctly. But when i traverse the graph as mentioned in the documentation, I get cassandra connection error related to ssl config.

gremlin> graph=HadoopGraph.open('/janusgraph-full-0.5.2/conf/olap.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> graph.configuration()
//// i can see all the properties from the file loaded here
gremlin> g.V().limit(1)
07:34:44 WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.cassandra.svc.cluster.local/10.0.165.158:9042 (com.datastax.driver.core.exceptions.TransportException: [cassandra.cassandra.svc.cluster.local/10.0.165.158:9042] Connection has been closed))
Type ':help' or ':h' for help.

I can verify from my cassandra logs that a connection was attempted but request was rejected for ssl reasons. Below are the logs from cassandra instance:

INFO  [epollEventLoopGroup-2-4] 2023-05-02 07:34:58,809 Message.java:826 - Unexpected exception during request; channel = [id: 0xeb0e017f, L:/10.12.0.224:9042 ! R:/10.12.0.135:60316]
io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0400000001000000500003000b43514c5f56455253494f4e0005332e302e30000e4452495645525f56455253494f4e0005332e392e30000b4452495645525f4e414d4500144461746153746178204a61766120447269766572
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1057) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411) [netty-all-4.0.44.Final.jar:4.0.44.Final]

Can someone help on the way SSL configuration needs to be passed to Gremlin for OLAP traversal on JanusGraph?


Solution

  • Can you try the following config options instead?

    cassandra.input.native.ssl.trust.store.path
    cassandra.input.native.ssl.trust.store.password
    

    You can find how they are used from CqlConfigHelper.java. Unfortunately, the official doc does not mention this.