Search code examples
javaamazon-ec2cassandrajanusgraph

Can not acquire write lock on EC2-based Cassandra backend


I have deployed Cassandra 3.11.1 on an EC2 instance and connecting to it from my machine using the JanusGraph Java API. For testing purposes, the instance allows all inbound and outbound TCP connections.

In the cassandra.yaml file I have configured the following values like so:

  • listen_address: private EC2 address
  • broadcast_address: public EC2 address
  • seeds: public EC2 address
  • start_rpc: true
  • rpc_address: private EC2 address
  • endpoint_snitch: Ec2MultiRegionSnitch

Note that when I want to use the public EC2 address as rpc_address, Cassandra startup fails with this message:

Failed to bind port 9042 on <public EC2 address>

Finally I try to connect to the graph, add a vertex and commit.

JanusGraph graph = JanusGraphFactory.build()
        .set("storage.backend", "cassandra")
        .set("storage.hostname", "<public EC2 address>")
        .set("storage.cassandra.keyspace", "debug")
        .set("storage.batch-loading", false)

        .set("query.force-index", false)
        .set("query.fast-property", true)

        .set("cache.db-cache", true)
        .set("schema.default", "default")

        .set("index.search.backend", "elasticsearch")
        .set("index.search.hostname", "127.0.0.1")
        .set("index.search.elasticsearch.interface", "REST_CLIENT")
        .open();

graph.addVertex("myLabel");
graph.tx().commit();

Committing fails with the following error:

09:41:53.297 [main] ERROR o.j.g.database.StandardJanusGraph - Could not commit transaction [1] due to exception
org.janusgraph.diskstorage.locking.TemporaryLockingException: Temporary locking failure
    at org.janusgraph.diskstorage.locking.AbstractLocker.writeLock(AbstractLocker.java:309) ~[janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStore.acquireLock(ExpectedValueCheckingStore.java:103) ~[janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.acquireLock(KCVSProxy.java:52) ~[janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.diskstorage.BackendTransaction.acquireIndexLock(BackendTransaction.java:256) ~[janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:572) ~[janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:702) ~[janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1374) [janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit(JanusGraphBlueprintsGraph.java:272) [janusgraph-core-0.2.0.jar:na]
    at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:105) [gremlin-core-3.2.6.jar:3.2.6]
    at engineering.divine.core.test.ConnectionTest.test(ConnectionTest.java:29) [classes/:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_45]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_45]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45]
    at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45]
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12]
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12]
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) [junit-4.12.jar:4.12]
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) [.cp/:na]
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Lock write retry count exceeded
    at org.janusgraph.diskstorage.locking.consistentkey.ConsistentKeyLocker.writeSingleLock(ConsistentKeyLocker.java:341) ~[janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.diskstorage.locking.consistentkey.ConsistentKeyLocker.writeSingleLock(ConsistentKeyLocker.java:125) ~[janusgraph-core-0.2.0.jar:na]
    at org.janusgraph.diskstorage.locking.AbstractLocker.writeLock(AbstractLocker.java:304) ~[janusgraph-core-0.2.0.jar:na]
    ... 32 common frames omitted

I would appreciate any pointers on what could be wrong.


Solution

  • The network latency between my local machine and the EC2 instance was in fact too high to acquire a lock in time. Increasing the storage.lock.wait-time from 100ms to 1s resolved the issue.