Search code examples
cassandracassandra-2.0datastaxnodetool

Cassandra nodetool repair - out of memory error


I have Cassandra Datastax 2.2.3 cluster (only one node) and as a test I'm adding a new node. After successfully adding the new node and starting it with bootstrap=false, I'm trying to rebalace it with nodetool repair.

However, this error pops up in logs of the old node:

ERROR [SharedPool-Worker-142] 2015-10-30 14:02:41,993 JVMStabilityInspector.java:117 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.7.0_80]
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) ~[na:1.7.0_80]
    at org.apache.cassandra.utils.memory.SlabAllocator.getRegion(SlabAllocator.java:137) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.memory.SlabAllocator.allocate(SlabAllocator.java:97) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.memory.ContextAllocator.allocate(ContextAllocator.java:57) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.memory.ContextAllocator.clone(ContextAllocator.java:47) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.memory.MemtableBufferAllocator.clone(MemtableBufferAllocator.java:61) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.Memtable.put(Memtable.java:209) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1244) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:406) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:366) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:50) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80]
    at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.2.3.jar:2.2.3]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]

and this:

ERROR [SharedPool-Worker-126] 2015-10-30 14:02:04,049 SEPWorker.java:141 - Failed to execute task, unexpected exception killed worker: {}
java.lang.IllegalStateException: Shutdown in progress
    at java.lang.ApplicationShutdownHooks.remove(ApplicationShutdownHooks.java:82) ~[na:1.7.0_80]
    at java.lang.Runtime.removeShutdownHook(Runtime.java:239) ~[na:1.7.0_80]
    at org.apache.cassandra.service.StorageService.removeShutdownHook(StorageService.java:728) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.JVMStabilityInspector$Killer.killCurrentJVM(JVMStabilityInspector.java:119) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.JVMStabilityInspector$Killer.killCurrentJVM(JVMStabilityInspector.java:109) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.JVMStabilityInspector.inspectThrowable(JVMStabilityInspector.java:68) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:168) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]

and the repair fails:

Repair session 24406790-8014-11e5-bf74-a1fb6926eba3 for range (8334390792461377170,8383846774169681811] failed with error Endpoint /x.x.x.x died

I've tried running nodetool repair -seq - result is the same.

Questions?

  • How much memory does nodetool repair need? How to check it?
  • How can I rebalance the ring now? Is there any way to trigger repair step by step?
  • If not, can I add "virtual" RAM (maybe as swap), increase heap and trigger the repair?

Solution

  • Running repair doesn't rebalance the ring. What you want is to run nodetool rebuild on the new node to stream data to it.