I'm new to event sourcing. From what I've read and seen event stores are described to have very basic capabilities / limited interface, something around these lines
getStreamForId: (streamId: ID) => Events[];
appendToStream: (streamId: ID; expectedVersion: number) => void;
While this works fine and most db architecture will suffice, what about replaying events in order to create new projections? Wouldn't you need to pick a db architecture that allows more sophisticated querying, so you can build projections based on certain event types?
Example: Let's say I have three aggregates order, customer, invoice. For simplicity's sake there existing 12 types if events in total.
Now after a few months a new business needs arise that can be satisfied with a new report that projects total order amount per customer. In order to create this projection I neeed event types 2, 5 and 12.
How would you replay these events if the event store has limited query capabilities?
My concrete case: I have to decide on a db infrastructure for a project. I was thinking about DynamoDB, which would work fine for the limited interface given above, but has very narrow querying capabilities. Since the chance that new projections/report requirements will come down the line, I wonder if I am either missing something fundamental about event sourcing or the replaying query requirements to an event store are a detail that is left out in the event sourcing material I read.
Using a different DB (e. g. Mongo) it would be easy to query only events of type 2, 5 and 12 and run them through a projector.
That interface is a minimal one for an event-sourced aggregate (there are event store implementations that don't need to expose an expected version as some other component (e.g. in Akka, typically Cluster Sharding) is preventing (or at least making sufficiently rare) concurrent updates).
An event store (or the application code interacting with the event store) is free to, on appending to a stream, also append the events (perhaps with some metadata) to stream(s) corresponding to that type (or often a tag, so that related events can be persisted together without having to reconstruct their ordering).
This is basically how Akka Persistence's plugin for Cassandra (like DynamoDB, also a Dynamo-inspired store) provides an eventsByTag
query. Note that there's a lot of subtlety involved in making a guarantee that every event with a given tag will eventually make it to the appropriate streams.
As noted, this can be done in the application if you're directly dealing with the streams. At the limit, based on this API, an aggregate about to persist an event can notify a process that it's going to persist however many events from the current version/offset, and persist the events after acknowledgement by that process. The process is itself event-sourced: its state is basically a set of stream IDs and version/expected event counts. It subscribes to the stream IDs and routes events to appropriate tagged streams.
Obviously, it's less effort to map out ahead of time which combinations of events are likely to be of interest and be feeding those events to tagged/typed streams. The events of interest are likely to be domain events (the events you'd be emitting if you were just doing an event-driven architecture vs. event sourcing), while other events are more artifacts of the fact that the aggregate is event-sourced and less likely to be needed, so it's often useful to have firehose streams of every domain event: later remixes can subscribe to the firehose. If you later discover a consumer that would like a type of event that you didn't expect there to be interest in, that's a little more work, but assuming you know which streams are from your aggregates, you can scan through them and fill this new stream; this is a last resort, but it's possible (in contrast, with destructive update style persistence, you can't in general answer the "what was the state at this time" question).