Search code examples
azurepartitioningazure-eventhubevent-processor-host

Azure Eventhub / Event Processor Host: Partitioning not working as predicted


We are working on a project right now which implements and uses the Azure Eventhub. We use the Event Processor Host to process the data from the Eventhub. We have 32 partitions distributed on to 3 nodes and are wondering how the Event Processor Host distributes and balances the partitions on to the receivers / nodes – especially when using partition key.

All data from all 4 customers

We currently have 4 different customers (blue, orange, purple and light blue) which sends us different sizes of data. As you can see the blue customer on the left sends approx. 132k strings of data, while the light blue customer on the right only sends 28. Our theory was, that given a partitionkey based on the customer (the coloridentification) we would see that a customers data would only be placed in one node. Instead we can see that the data is somehow evenly distributed on the 3 nodes as seen below:

Node 1:

Node 1

Node 2:

Node 2

Node 3:

Node 3

Is there something we’ve misunderstood in regards to how the use of the partitionkey works? From what we’ve read in the documentation, then when we don’t specify partition keys, then a “round-robin” approach will be used – but even with the use of a partition key, it somehow distributes them evenly. Are we somehow stressing the nodes – with a blue customer having a huge amount of data and another customer having almost nothing? Or what is going on?

To visualize our theory we've drawn the following: Possibly what is happening

So are we stressing the top node with a blue customer, that in the end has to move a partition to the middle node?


Solution

  • A partition key is intended to be used when you want to be sure that a set of events is routed to the same partition, but you don't want to assign an explicit partition. In short, use of a partition key is an explicit request to control routing and prevents the service from balancing across partitions.

    When you specify a partition key, it is used to produce a hash value that the Event Hubs service uses to assign the partition to which the event will be routed. Every event that uses the same partition key will be published to the same partition.

    To allow the service to round-robin when publishing, you cannot specify a partition key or an explicit partition identifier.