Search code examples
google-cloud-platformgoogle-bigquerycqrsevent-sourcing

Event sourcing with GCP BigQuery


We are building a system that stores purchase history of customers. Mainly 3 entities,

  • Orders (mutable data as order status keep changing)
  • Returns (again mutable as return can undergo status changes)
  • Invoices (immutable)

For 'Orders' and 'Returns' we are going to subscribe to pubsub events from another GCP project. Eventually, we need to have event log of all events and also the the current state of the 'orders' and 'returns'. In my view BigQuery would do a great job storing events, as it is mainly an append-only db and it can help in tracking events back in time. However, some users of BigQuery are only interested in the final state.

Eg:
Events | State
----------------     
Order1 | created
Order1 | in_Progress
Order2 | created
Order1 | out_for_Delivery

So the current state would be
Events | State
----------------     
Order1 | out_for_Delivery
Order2 | created

I could see some options as below,

Option1: Store all events in BigQuery. And for creating the final state create BQ Views on top of the table that sort by timestamp and pick the latest version

Option2: I read about Projection queries (using CQRS) which help to build the application state at any point back in time. But I am not sure how this is practically done.

Please guide me here as to which option is better and also some more details on option2 which is a standard practice for building "Event sourcing + CQRS" kind of systems.

Regards, Prasad.


Solution

  • If you need to filter your table and show only the last status for your events, using a View is the easiest and best option. If you have a timestamp column in your event tables you can create a view using a query like below for example:

    SELECT
       event, 
       state, 
       timestamp_column
    FROM (
       SELECT
           event, 
           state, 
           timestamp_column,
           ROW_NUMBER() OVER(PARTITION BY event ORDER BY timestamp_column DESC) AS rn
       FROM
           `your-event-table`
         ) 
    WHERE rn =1
    

    What this query is basically doing is:

    1. Grouping you table by the events
    2. Numbering the rows inside each group from the most recent registries to the older ones
    3. Selecting only the rows whose number is 1, which means the final result will be the most recent registry for all the events.