Search code examples
azure-functionsazureservicebusazure-eventhub

Can I avoid Event Hub duplicates by plugging it with Service Bus Sessions?


Our application architecture is the following :

Third party event hub ==> our azure function ==> our event hub ==> our event hub capture.

The issue is that we are getting duplicates messages quite often, and we don't any primary key in the data.

I could read online that Service Bus Sessions could avoid this duplicate issue.

enter image description here

Azure Service Bus now supports sessions, so you can do in order queue processing with service bus queues and topics in addition to Event Hubs listed below. Service Bus Sessions provide the added benefit of reprocessing failures individually instead of in batches. While Event Hubs can guarantee order as show below, if a partition lock is lost the in-order batch could resume in another instance causing duplicates. Consider using Service Bus Sessions if this is an issue. Both provide at-least-once delivery guarantees.

I am new to Azure and streaming cloud architecture in general.

My question is the following :

  • Could Service Bus Session be plugged in our current architecture?
  • Or is it rather a competing service of our event hub ?

I am not sure we would be ready to give up on our event hub now as we have just invested resources implementing it.


Solution

  • Service Bus sessions doesn't really guarantee that there will be no duplicates, as it continues to be "at least once delivery", however service bus has a feature called duplicate detection that basically helps to prevent duplicates for message coming to the namespace,it won't help for outgoing duplicates (mostly caused by transient network issues).

    Besides what I mentioned above, the behavior you are describing doesn't sound normal. Yes, Event Hubs could have duplicates but it should not be happening quite too often, if that's the case I would suggest focusing on the root cause for whatever is causing so many duplicates. (you can open a MS support ticket for some help on finding this) But if those duplicates aren't that many, then I suggest you make your consumer resilient to duplicates as suggested by the other answer.