Passing data between subsequental Consumers in LMAX Disruptor (from unmarchaler to business logic)?

As explained in https://martinfowler.com/articles/lmax.html, I would need to process my RingBuffer's events first with Unmarchaler and then with Business Logic Processor. Suppose it is configured like (https://lmax-exchange.github.io/disruptor/docs/com/lmax/disruptor/dsl/Disruptor.html)

  Disruptor<MyEvent> disruptor = new Disruptor<MyEvent>(MyEvent.FACTORY, 32, Executors.newCachedThreadPool());
  EventHandler<MyEvent> handler1 = new EventHandler<MyEvent>() { ... };
  EventHandler<MyEvent> handler2 = new EventHandler<MyEvent>() { ... };
 disruptor.handleEventsWith(handler1);
 disruptor.after(handler1).handleEventsWith(handler2);

Idea is then that handler1 is unmarchaler and handler2 consumes stuff processed by handler1.

Quesion: How I can exactly code the "unmarchaling and putting back to disruptor" part? I found this https://groups.google.com/forum/#!topic/lmax-disruptor/q6h5HBEBRUk explanation but I didn't quite understand. Suppose the event is arrived at callback for handler1

void onEvent(T event, long sequence, boolean endOfBatch)

(javadoc: https://lmax-exchange.github.io/disruptor/docs/com/lmax/disruptor/EventHandler.html)

that un-marchals some data from event. Now I need to append unmarchaled data to the event for handler2 that will be dealing with unmarchaled object.

What needs to be done to "update" event? Is modifying "event" object enough?

Solution

The impact of this really depends on your particular scenario, and as always, if you are after low latency, you should try both and benchmark.

The most straightforward is to update the 'event' object, however depending on your particular approach, this might miss a lot of the single-writer benefits of the disruptor. I will explain and offer some options.

Suppose for example you have handler1 and handler2, handler1 is running in thread1 and handler2 is running in thread2. The initial event publisher is on thread0.

Thread0 writes an entry into the buffer at slot 1
Thread1 reads the entry in slot 1 and writes into slot 1
Thread0 writes an entry into the buffer at slot 2
Thread2 reads from slot 1 and writes to output
Thread1 reads the entry in slot 2 and writes into slot 2
Thread2 reads from slot 2 and writes to output

If you think of the physical memory layout, slot1 and slot2 are hopefully next to each other in memory. For example they could be some subset of a byte array. As you can see, you are reading and writing alternatively from different threads (probably different cpu cores) into very adjacent chunks of memory, which can lead to false sharing / cache lines bouncing around. On top of that, your reads and writes through memory are not likely to be linear so you will miss out on some of the benefits of the CPU caches.

Some other options which might be nicer:

Have separate ringbuffers, where the first ringbuffer is raw data, and the second ringbuffer is unmarshalled events. This way the data is sufficiently separated in memory to avoid these costs. However this will have a bandwidth impact.
Have the unmarshaller and the work done directly in the same handler. Depending on the amount of work in your unmarshaller and your handler this might be viable.