I'm dealing with a high throughput application of EventHub. According to the documentation, in order to achieve very high throughput from a single sender, then client-side batching is required (without exceeding the 256 KB limit per event).
Best Practices for performance improvements using Service Bus brokered messaging suggests Client-side batching for achieving performance improvements. It describes client-side batching is available for queue or topic clients, which enables delaying the sending of messages for a certain period of time, then it transmits the messages in a single batch.
Is client-side batching available in the EventHub client?
ShortAns: EventHubs is designed to support very-high thruput scenarios - Client-side batching is one of the Key features to enable this. API is `EventHubClient.SendBatch(IEnumerable).
Long Story:
The link that you found: Best Practices for performance improvements using Service Bus brokered messaging applies to ServiceBus Queues & Topics - which uses a Microsoft Proprietary protocol called - SBMP - and is not an Open Standard. We implemented BatchFlushInterval in that Protocol. This was a while back (guess around 2010) - where Amqp protocol wasn't standardized yet. When we started building Azure EventHubs service - Amqp is the new Standard protocol for implementing performant messaging solutions and hence, we used Amqp as our first-class protocol for Event Hubs. BatchFlushInterval doesn't have any effect in EventHubs (Amqp).
EventHubClient
translates every raw event that you need to send to EventHub into AmqpMessage (refer to Messaging section in the (Amqp Protocol Specification).
In order to do that, as per the protocol, it adds few extra bytes to each Message. The estimated Size of each Serialized EventData (to AmqpMessage) can be found using the property - EventData
SerializedSizeInBytes
.
With that background, coming to your scenario: Best way, to achieve very high-thruputs - is to use EventHubClient.SendBatch(IEnumerable<EventData>)
api. The contract of this Api is - before invoking SendBatch
- the caller need to make sure the Serialized Size of this Batch of messages doesn't exceed 256k. Internally, this API converts the IEnumerable<EventData>
into 1 Single AmqpMessage and sends to EventHub Service. The limit on 1 single AmqpMessage imposed by EventHubs service as-of 4-25-2016 is 256k. Plus, one more detail - when the list of EventData
are translated to a Single AmqpMessage - EventHubClient
needs to promote some information into the BatchMessage header - which is common for all of those messages in the batch(info like partitionKey
). This info. is guaranteed to be a max of 6k.
So, all-in-all, the caller need to keep track of the aggregate size of all EventData
in the IEnumerable<EventData>
and make sure that this falls below 250k.
EDIT ON 09/14/2017
WE added EventHubClient.CreateBatch
API to support this scenario.
There is no more guess work involved in constructing a Batch of EventData
s. Get an Empty EventDataBatch
from EventHubClient.CreateBatch
API and then use TryAdd(EventData)
api to add events to construct the Batch.
And, finally use EventDataBatch.ToEnumerable()
to get the underlying events to pass to the EventHubClient.Send()
API.