One of great promises of Event Sourcing is the ability to replay events. When there's no relationship between entities (e.g. blob storage, user profiles) it works great, but how to do replay quckly when there are important relationships to check?
For example: Product(id, name, quantity)
and Order(id, list of productIds)
. If we have CreateProduct
and then CreateOrder
events, then it will succeed (product is available in warehouse), it's easy to implement e.g. with Kafka (one topic with n1
partitions for products, another with n2
partitions for orders).
During replay everything happens more quickly, and Kafka may reorder the events (e.g. CreateOrder
and then CreateProduct
), which will give us different behavior than originally (CreateOrder
will now fail because product doesn't exist yet). It's because Kafka guarantees ordering only within one topic within one partition. The easy solution would be putting everything into one huge topic with one partition, but this would be completely unscalable, as single-threaded replay of bigger databases could take days at least.
Is there any existing, better solution for quick replaying of related entities? Or should we forget about event sourcing and replaying of events when we need to check relationships in our databases, and replaying is good only for unrelated data?
I think I found the solution for scalable (multi-partition) event sourcing:
messages
murmurHash(login) % partitionCount
)Product
, Order
), every partition should contain own copy of the dataCreateOrder
events will be processed quickly without leaving user's partitionProduct
/ Order
domain, partitions could work similarly to Walmart/Tesco stores around a country, and the messages sent between partitions ('stores') could be like CreateProduct
, UpdateProduct
, CreateOrder
, SendProductToMyPartition
, ProductSentToYourPartition
This way even when Kafka (or any other event sourcing system) chooses to reorder messages between partitions, we'll still be ok, because we don't ever read any data outside our single-threaded 'island'.
EDIT: As @LeviRamsey noted, this 'single-threaded island' is basically actor model, and frameworks like Akka can make it a bit easier.