Search code examples
azurehashpartitioningazure-eventhub

What hash function is used for partitionkey mapping to partitionid in EventHub


I am using an Azure EventHub to process customer events.

The customer's id is used as partitionkey for EventHub. The key will be hashed and is used to select an EventHub partition. All events for a given customer will end up on the same partition.

The EventHub documentation mentions the following:

You can use a partition key to map incoming event data into specific partitions for the purpose of data organization. The partition key is a sender-supplied value passed into an event hub. It is processed through a static hashing function, which creates the partition assignment. If you don't specify a partition key when publishing an event, a round-robin assignment is used.

There are 10 partitions in my eventhub and now I'm trying to map which customers are ending up on the same partition-ids. A list of all customer ids is available.

However, I cannot find documentation on the exact hashing function used by EventHub to determine the matching partitionid.

If this is not available, what would be the best approach do determine the partitionid for a given partitionkey.


Solution

  • The algorithm is not publicly documented nor is there a supported way to determine the assigned partition for a given key. It is strongly advised that your application not assume that it can reproduce the service hash reliably.

    If you need to understand what partition data appears in, it is recommended that your application assign a specific partition when publishing events.

    That all said, the algorithm used is not an off-the-shelf standard, but a customized variation. You'd need the source to be able to reproduce its results in all cases.