I'm using the Disruptor framework for performing fast Reed-Solomon error correction on some data. This is my setup:
RS Decoder 1
/ \
Producer- ... - Consumer
\ /
RS Decoder 8
In the disruptor DSL terms, the setup looks like this:
RsFrameEventHandler[] rsWorkers = new RsFrameEventHandler[numRsWorkers];
for (int i = 0; i < numRsWorkers; i++) {
rsWorkers[i] = new RsFrameEventHandler(numRsWorkers, i);
}
disruptor.handleEventsWith(rsWorkers)
.then(writerHandler);
When I don't have a disk output consumer (no .then(writerHandler)
part), the measured throughput is 80 M/s, as soon as I add a consumer, even if it writes to /dev/null
, or doesn't even write, but it is declared as a dependent consumer, performance drops to 50-65 M/s.
I've profiled it with Oracle Mission Control, and this is what the CPU usage graph shows:
Without an additional consumer:
With an additional consumer:
What is this gray part in the graph and where is it coming from? I suppose it has to do with thread synchronisation, but I can't find any other statistic in Mission Control that would indicate any such latency or contention.
Your hypothesis is correct, it is a thread synchronization issue.
From the API Documentation for EventHandlerGroup<T>.then
(Emphasis mine)
Set up batch handlers to consume events from the ring buffer. These handlers will only process events after every
EventProcessor
in this group has processed the event.This method is generally used as part of a chain. For example if the handler A must process events before handler B:
This should necessarily decrease throughput. Think about it like a funnel:
The consumer has to wait for every EventProcessor
to be finished, before it can proceed through the bottleneck.