azure azure-iot-hub azure-eventhub azure-iot-hub-device-management

What is the function of a partition in the Microsof Azure Iot Hub?

When I'm going to create an Iot Hub, the Azure platform requests for the number of partitions of the IoT Hub. I have read about the partitions on this topic purpose of Azure iot hub device-to-cloud partitions but I don't understand what is the relation between consumer groups and partitions, and which is the relation with the reading of the data.

Solution

Partitions are primary there for support of scaling. The default behavior is that messages that are send to the hub are divided over those partitions.

So lets say we have 4 partitions (1-4) containing some messages (A-L):

Partition 1: A, E, I
Partition 2: B, F, J
Partition 3: C, G, K
Partition 4: D, H, L

Lets also say that we have defined 2 consumer groups, C1 and C2. If you start a process to read the messages from the hub you define a consumer group (if not, the default consumer group is used).

So let us have 2 readers, one (R1) configured to read using C1 and the other (R2)to read using C2.

Both readers have access to the same partitions and messages. But both have their own progress tracker. This is the important part!

In a real word scenario you might have a stream of data, lets assume log messages. The requirements are that all log messages have to be written to a database and that some messages having a higher loglevel need to be send as a high priority alert using sms. If you would have just one consumer group (C1, read by R1) all messages will eventually be processed. But if the database writes are slow it could very well be that it takes some time between a message being delivered and a message being processed.

Now, if we would have 2 consumer groups, the reader (R2) for that consumer group (C2) could skip all low loglevel messages and only process the critical messages that are to be send using sms. This reader will go through all the messages a lot faster than the one that needs to write all the messages to a database.

TL;DR: multiple consumer groups can be used to separate slow stream processors for fasters stream processors. Each consumer group tracks its own progress in the stream.

So in the end progress may look like this:

Consumer group 1 (doing some time consuming processing)

Partition 1: A, E, I
Partition 2: B, F, J
Partition 3: C, G, K
Partition 4: D, H, L

Consumer group 2 (doing some fast message processing)

Partition 1: A, E, I
Partition 2: B, F, J
Partition 3: C, G, K
Partition 4: D, H, L

where the bold characters represent a processed message.

Edit
If I have two readers in the same consumer group, Does each reader have their own progress or the progress is per consumer group?

Each reader is connected to an event hub partition through a consumer group, the progress is stored per partition per consumer group. So, in a sense a reader has its own progress but the reader is short-lived, a new instance of a reader connecting to the same partition will continue where a previous reader left.