Background- I have a backend app consuming telemetry messages from the "built-in messages/events" route on IoTHub. Telemetry is consumed by creating a consumer with the EventHubClient, using the python SDK. Devices are provisioned to iothub programmatically and given x509 certificates for authentication, the creation/expiration dates are valid on client and CA certificates. In the past, I have had multiple devices sending to IoThub at the same time and being consumed by the backend app. After some time, we set up azure stream analytics to listen to the same backend/route as the existing backend app. Fast forward a couple months, now we can only get one device ID to successfully be consumed by the listening client on the original backend app.
Symptom- I have 2 devices, Device A and Device B. Device A's device ID is Bob, and the CN on the x509 is Bob. Device B's device ID is Sally, and the CN on the certificate is Sally. They were both provisioned through the device provisioning service, and signed by the same CA, which is loaded and verified in both the DPS and iothub. All of the telemetry using credentials for Bob are being consumed by both stream analytics and the original backend app. All telemetry sent using credentials for Sally are only being consumed by stream analytics. We can change the device ID and use Bob credentials on device A or device B and messages are consumed by both backends, and if we use Sally device id/credentials it always is only processed by stream analytics. Both stream analytics and the original backend app are set to the $Default consumer group. I believe the partition is irrelevant unless I'm using an eventhub, but stream analytics doesn't have a field for partition id and the backend app consumer is using partition 0. All messages are being delivered to the events/messages built-in endpoint, and no messages are delivered to other endpoints.
Question- Why is my backend app only consuming messages with device ID/credentials for Bob?
I tried to give all relevant info, but if there is something I left out just let me know and I can provide more details.
edits: I had already tried turning off stream analytics completely (and restarting the backend app just in case) so only the backend app is consuming messages from the endpoint, which didn't help. But since after the first response I created a new consumer group on the endpoint for stream analytics and changed the consumer group for the stream analytics input to that new consumer group. No change in the "symptoms".
The issue was related to partition id. I was using the azure.eventhub library to consume events from the iothub backend. This library has been in a major overhaul for the last 10 months or so. We were using a pre-release version (5.0.0b4 I think) because it did include a lot of useful methods, and all of the sample code for that (EventHubClient.create_consumer) specified partition "0". Since iothub determines partition id based on device id, some devices were being sent to partition 1. Switching the partition id in the create_consumer method showed the problem. We then were able to only see all telemetry "Sally" on the backend app but none of "Bob". Since stream analytics doesn't take a partition id input, I'm assuming it consumes all partitions which is why it was processing all telemetry.
The solution: I am now using azure.eventhub 5.0.1 and EventHubConsumerClient.recieve() method to consume the messages. It seems to be doing the job for all partitions. The only potential issue is that it looks like it pulls batches of data from the partitions rather than reading the hub as a whole in real time. For now, I'm not sending data at a high enough frequency that it's an issue, but with a very high sample rate I believe it will read chunks of messages from each partition, and if the queue is large enough it will delay processing messages from other partitions until it finishes it's batch. It also requires that you use a storage account for checkpoint saving if you're using a stateless platform like a container instance.
Edit: Confirmed- with a high sample rate, the receiver will listen to one partition for some time, usually 1 or 2 minutes, and then switch to the other partition. The result is that with 3 devices sending frequent data, for a couple minutes I only get data from 1 device, and for a couple minutes after that I only get data from the other 2 devices. Never do I get data from all 3 devices processed in real time. Bummer.