Search code examples
javajanusgraphtinkerpop3berkeley-db

JanusGraph image stop responding after query timeout, how to prevent it?


When JanusGraph server exceeded his 'evaluationTimeout' the server stop responding

I'm using the default docker image janusgraph/janusgraph:latest (Berkeley and Lucene) and connecting with gremlin console

After the first timeout accord, queries that worked before getting the same response:

Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g.V().limit(4).valueMap()]: null - try increasing the timeout with the :remote command

server error:


java.util.concurrent.TimeoutException: Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g.V()]
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$1(GremlinExecutor.java:316)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(Thread.java:748)
1318978 [pool-6-thread-1] WARN  org.janusgraph.diskstorage.log.kcvs.KCVSLog  - Could not read messages for timestamp [2020-05-24T10:12:30.449Z] (this read will be retried)
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:56)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:158)
        at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:725)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Could not start BerkeleyJE transaction
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:163)
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:47)
        at org.janusgraph.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreManagerAdapter.beginTransaction(OrderedKeyValueStoreManagerAdapter.java:68)
        at org.janusgraph.diskstorage.log.kcvs.KCVSLog.openTx(KCVSLog.java:319)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:145)
        at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:161)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        ... 9 more
Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 18.3.12) /var/lib/janusgraph/data java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.ThreadInterruptedException.wrapSelf(ThreadInterruptedException.java:105)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1844)
        at com.sleepycat.je.Environment.checkOpen(Environment.java:2697)
        at com.sleepycat.je.Environment.beginTransactionInternal(Environment.java:1409)
        at com.sleepycat.je.Environment.beginTransaction(Environment.java:1383)
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:146)
        ... 16 more
Caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 18.3.12) /var/lib/janusgraph/data java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:67)
        at com.sleepycat.je.tree.IN.latch(IN.java:547)
        at com.sleepycat.je.dbi.CursorImpl.latchBIN(CursorImpl.java:402)
        at com.sleepycat.je.dbi.CursorImpl.cloneCursor(CursorImpl.java:230)
        at com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:5252)
        at com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:5259)
        at com.sleepycat.je.Cursor.retrieveNextNoDups(Cursor.java:3550)
        at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:3312)
        at com.sleepycat.je.Cursor.getInternal(Cursor.java:1313)
        at com.sleepycat.je.Cursor.get(Cursor.java:1244)
        at com.sleepycat.je.Cursor.getNext(Cursor.java:1512)


Solution

  • This JanusGraph issue explains the problem in reference to traversal interruption with TinkerPop's test suite:

    https://github.com/JanusGraph/janusgraph/issues/171

    Since interrupts are not supported in Berkeley JE (see BerkeleyJEStoreManager and Berkeley JE FAQ) there are internal state errors in all tests after that suite runs.

    One reason TinkerPop has that interruption test because it validates that Gremlin Server can use that interrupt function to cancel server based traversals successfully. It looks like Berkeley simply can't support that. You would need to use a different JanusGraph backend.