java multithreading performance disruptor-pattern

What causes this performance drop?

I'm using the Disruptor framework for performing fast Reed-Solomon error correction on some data. This is my setup:

          RS Decoder 1
        /             \
Producer-     ...     - Consumer
        \             /
          RS Decoder 8

The producer reads blocks of 2064 bytes from disk into a byte buffer.
The 8 RS decoder consumers perform Reed-Solomon error correction in parallel.
The consumer writes files to disk.

In the disruptor DSL terms, the setup looks like this:

        RsFrameEventHandler[] rsWorkers = new RsFrameEventHandler[numRsWorkers];
        for (int i = 0; i < numRsWorkers; i++) {
            rsWorkers[i] = new RsFrameEventHandler(numRsWorkers, i);
        }
        disruptor.handleEventsWith(rsWorkers)
                .then(writerHandler);

When I don't have a disk output consumer (no .then(writerHandler) part), the measured throughput is 80 M/s, as soon as I add a consumer, even if it writes to /dev/null, or doesn't even write, but it is declared as a dependent consumer, performance drops to 50-65 M/s.

I've profiled it with Oracle Mission Control, and this is what the CPU usage graph shows:

Without an additional consumer:

With an additional consumer: With additional consumer

What is this gray part in the graph and where is it coming from? I suppose it has to do with thread synchronisation, but I can't find any other statistic in Mission Control that would indicate any such latency or contention.

Solution

Your hypothesis is correct, it is a thread synchronization issue.

From the API Documentation for EventHandlerGroup<T>.then (Emphasis mine)

Set up batch handlers to consume events from the ring buffer. These handlers will only process events after every EventProcessor in this group has processed the event.

This method is generally used as part of a chain. For example if the handler A must process events before handler B:

This should necessarily decrease throughput. Think about it like a funnel:

Event Funnel

The consumer has to wait for every EventProcessor to be finished, before it can proceed through the bottleneck.