I've setup the following test:
Expectation:
Run
traces
| where message == "Event received"
| summarize count() by bin(timestamp,1s), cloud_RoleInstance
| render timechart
(this is a 3x run for 10k events each, to eliminate the "pod not warmed up variable")
Note that there's no (or very little) overlap between the pods activity, as if one of them is holding a lock or something, and mysteriously, at some point, the lock is released and used by the other pod.
Relevant Consumer code:
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
_processor = new EventProcessorClient(_storageClient, _consumerGroup, _hubConnection, _eventHubName);
_processor.ProcessEventAsync += ProcessEventHandler;
_processor.ProcessErrorAsync += ProcessErrorHandler;
// Start the processing
await _processor.StartProcessingAsync(stoppingToken);
}
internal async Task ProcessEventHandler(ProcessEventArgs eventArgs)
{
_logger.LogTelemetry("Event received");
await eventArgs.UpdateCheckpointAsync(eventArgs.CancellationToken);
}
There is actually nothing wrong with the code above. On this GitHub issue we discussed a bit and was able to notice the expected behavior when dealing with larger batches (500k events).