Event Hubs don't let you store messages longer than 7 (maybe up to 30) days. What is Azure's suggested architecture for PaaS Event Sourcing with these limitations? If it's Event Hub + snapshotting, what happens if we somehow need to rebuild that state? Additional, is Event Hub's answer to KSQL/Spark Azure Stream Analytics?
Great Question!
Yes, EventHubs is intended to be used for Event Sourcing
or Append-only log
pattern. EventHubs can be used as source/sink for stream processing & analytics engines like SPARK and hence not its competitor. In general, EventHubs offers similar capabilities as that of Apache Kafka.
& Yes, to implement rebuilding transactions from the append-only log Snapshotting
is definitely the recommended approach!
While shaping EventHubs
as a product offering, our considerations for assigning a default value for retentionPeriod
- were -
So, it was clear that we don't need infinite log, & a timebound of a day will do for most use-cases. Hence, we started with a default 1 day - and gave a knob until 7 days.
If you think, you would have a case, where you will have to go back in time for >7 days to rebuild a snapshot (for ex: for debugging - which is generally not a 99% scenario - but, agreed that designing & accommodating for this is very-wise), recommended approach is to push the data to an archival store.
When our usage Metrics
showed that many of our customers have one EventHubs consumer group
dedicated for pushing data to archival store - we wanted to enable this capability out-of-the-box & then started to offer - Event Hubs Capture feature.