We are currently looking at Azure Event Hubs as a mechanism to dispatch messages to background processors. At the moment the queue-based systems are being used.
Most processors are writing data to SQL Server databases, and the writes are wrapped in transactions.
Event Hubs are positioned as at-least-once communication channel, so duplicate messages should be expected. EventProcessorHost
is the recommended API on the read side, which automates lease management and checkpointing using Azure Blob Storage.
But we have an idea, for some most critical processors, to implement checkpointing ourselves using a SQL Server table inside the same database, and write the checkpoint inside the same transaction of the processor. This should give us the strong guarantee of exactly once delivery when needed.
Ignoring lease management for now (just run 1 processor per partition), is SQL-based checkpointing a good idea? Are there other drawbacks, except the need to work on lower level of API and handle checkpoints ourselves?
Azure Storage is the built-in solution but we are not limited with that. If most of your processors are writing data to SQL Server databases and you do not want to have EventProcessorHost store checkpoints in Azure Storage (that requires a storage account), in my view, storing checkpoints in your SQL database which provide a easy way to make process event and manage checkpoint transactionally, it would be a good solution.
You could write your own checkpoint manager using ICheckpointManager interface to storing checkpoints in your SQL database.