Search code examples
cqrsevent-sourcingncqrs

Reporting in the CQRS/ES world


I think I understand the idea of the read model in the context of ES + CQRS (please correct me if not). However, I still have a few doubts about using it in the context of ‘serious’ reporting. Let us say I use a relational db plus some ORM to crud my read models. One basic ‘summary stats read model’ could look like this:

 class SummaryStats1
    {
    public Guid TypeId { get; set; }
    public string TypeName { get; set; }
    public Guid SubTypeId { get; set; }
    public string SubTypeName { get; set; }
    public int Count { get; set; }
    }

Given an event:

TypeId = 3acf7d6f-4565-4672-985d-a748b7706a3e
TypeName = Bla1
SubTypeId = 41532aa1-f5d1-4ec4-896b-807ad66f75fc
SubTypeName = Bla2

The normaliser would:

(1) Check whether there is an instance of the above combination (defined by TypeId, TypeName, SubTypeId, SubTypeName) (2) If there no instance it would create an instance and set Count to one. If there is it would increase the Count by one.

Is this the acceptable reporting approach? I guess one can run very efficient selects against this de-normalised data structure (for filtering and other sql ‘projections’):

SELECT  TypeName, Sum(Count) FROM SummaryStats1 GROUP BY TypeName

Would CQRS/ES experts agree with this? Is this 'the way' of doing things (i.e. create these dedicated report read models)? Any references to source code/real examples would be very much appreciated.


Solution

  • Is this the acceptatble reporting approach?

    whether it's the resporting approach of course varies depending on your requirements, but the general idea is correct.

    In summary:

    You generate your read models (the official term sometimes used is Eager Read Derivation) based on events coming from your domain.

    The read models can be anything you want (sql, redis, mongo, etc). Whatever enables your queries to be performant. In your example for instance there's no reason why you can't have 2 read models to even more efficiently do your queries (although what you describe is likely enough for most cases):

    1. your sql view as described
    2. a preaggregated view grouped by typeName so that you don't have to do the group each time on query-time (instead you calculate the grouping in the normalizer).

    In short, there's no right or wrong way on how to construct your read models. The beauty is exactly that you're completely free to model you read models in any way you want (based on the query pattern and performance bottlenecks you envision) without having to think about how those models impact writes (simply because they don't since cqrs splits reads and writes)

    Using eventsourcing in conjunction with CQRS gives for even nicer possibilties, namely to create new readmodels and populate them with data simply by replaying past events from the eventsource.

    Just some extra examples of what might be considered a 'read model' of your data:

    • a INCR view with Redis (which is an alternative of what you seemed to describe)
    • An Elasticsearch / Solr search index
    • A KV-store/ index for some quick lookups by key.

    Idea again, is that these 'read models'/ views are always kept up-to-date (eventually consistent) by pushing update events to them (usually by means of pubsub)

    For more good reading see the answer plus links to this question: Read side implementation approaches using CQRS