Apache Storm deactivated topologies cause high cpu utilization

I am having an issue with high cpu usage for deactivated apache storm topologies. I can reliably re-create the issue using the steps below but I haven't identified the exact cause or a solution yet.

The environment is a storm cluster on which 1 topology is running (The topology is extremely simple, I used the exclamation example). It is INACTIVE. Initially there is normal CPU usage. However, when I kill all topology JVM processes on all supervisors and let Storm restart them again, I find that some time later (~9 hours) the CPU usage per JVM process shoots up to nearly 100%. I have tested an ACTIVE topology and this does not happen with it. I have also tested more than one topology and observe the same results when they're in the INACTIVE state.

Steps to re-create:

Run 1 topology on an Apache Storm cluster
Deactivate it
Kill all topology JVM processes on all supervisors (Storm will restart them)
Observe the CPU usage on Supervisors shoots up to nearly 100% for all INACTIVE topology JVM processes.

Environment

Apache Storm 1.1.0 running on 3 VMs, one nimbus and 2 supervisors.

Cluster Summary:

Supervisors: 2
Used Slots: 2
Available Slots: 38
Total Slots: 40
Executors: 50
Tasks: 50

the topology has 2 workers and 50 executors/tasks (threads).

Investigation so far:

Apart from being able to reliably re-create the issue, I have identified, for the affected topology JVM process, the threads using the most CPU. There are 102 threads total in the process, 97 blocked, 5 IN_NATIVE. The threads using the most CPU are identical and there are 23 of them (all in BLOCKED state):

Thread 28558: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(long) @bci=11, line=338 (Compiled frame)
 - com.lmax.disruptor.MultiProducerSequencer.next(int) @bci=82, line=136 (Compiled frame)
 - com.lmax.disruptor.RingBuffer.next(int) @bci=5, line=260 (Interpreted frame)
 - org.apache.storm.utils.DisruptorQueue.publishDirect(java.util.ArrayList, boolean) @bci=18, line=517 (Interpreted frame)
 - org.apache.storm.utils.DisruptorQueue.access$1000(org.apache.storm.utils.DisruptorQueue, java.util.ArrayList, boolean) @bci=3, line=61 (Interpreted frame)
 - org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher.flush(boolean) @bci=50, line=280 (Interpreted frame)
 - org.apache.storm.utils.DisruptorQueue$Flusher.run() @bci=55, line=303 (Interpreted frame)
 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
 - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

I identified this thread by using jstack to get a thread dump for the process:

jstack -F <pid> > jstack-<pid>.txt

and top to identify the threads within the process using the most CPU:

top -H -p <pid>

Has anyone come across this or a similar issue before? Any help would be much appreciated.

Solution

The issue happens because the RingBuffer in a DisruptorQueue fills up and when publishing threads are trying to claim a slot they effectively get stuck doing LockSupport.parkNanos(1L). As per my comment on Storm JIRA