Event Sourced many-to-many aggregates

We have an event sourced system where we are caching data into aggregates for performance reasons.

Let's say we have 3 entities. Patient, Doctor, Appointment.

What works well for us is the one-to-many type relations. Example, imagine the following events:

* DOCTOR_CREATED
* DOCTOR_ARRIVED
* DOCTOR_LEFT

This we could aggregate into a one-to-many type relation. When a Doctor is added to the system, we can create a row in the database.

Every time they arrive to work, we can add that time to the doctor record, and every time they leave we can also add that. So we could end up with something like this:

{
  "id": 23,
  "name": "Dr Bill",
  "totalHoursWorked": 124,
  "timesheet": [
    {
      "arrivedAt": "2022-01-04 09:00:00",
      "leftAt": "2022-01-04 14:00:00",
      "hoursWorked":  5,
    },
    // etc...
  ]
}

No problems there.

Now let's suppose that we want to track the appointments. This is a many-to-many relation between users and doctors.

I'm interested in these events:

* DOCTOR_CREATED
* PATIENT_CREATED
* APPOINTMENT_CREATED

Because the event stream must be sequential through time, I cannot create an appointment record BEFORE either the relevant doctor or patient has been created.

How do I go about creating a view of the data model from the perspective of the appointments.

Maybe thinking about it in graphql terminology might help, but I want to optimise this query:

query {
  appointment {
    day
    time
    patient {
      name
      age
    }
    doctor {
      name
      specialty
    }
  }
}

I'd love to be able to store this data structure in the database as an aggregate. So that fetching appointment by ID can be done in one database query.

I'm going to run into problems here with timelines, because if I wait for the first APPOINTMENT_CREATED event before I create a row in the database, then I've missed the relevant PATIENT and DOCTOR events.

If I capture the PATIENT and DOCTOR events first in anticipation, then I have to store all possible combinations of Doctor and Patient just on the off-chance one of them might want to have an appointment later on. I'm also faced with this issue of the aggregate having an inconsistent data structure. Rows in the table might be indexed by doctor-patient id, or appointment id depending on what stage of the event stream we have got to.

The only way I can currently think to do this, is to have the aggregate trying to optimise this query wait for the APPOINTMENT_CREATED event, and then have to asynchronously query the database to retrieve the patient and doctor records at that point in time.

Although the way we've implemented our system, all our aggregates are built up from composition of pure functions which just take the previous aggregate state, the event in question, and return the new aggregate state.

Is what I want impossible with the architecture we have built so far? Do I need an escape hatch to allow our aggregate hydration to perform async db queries (not keen on this)?

Or is an aggregate the wrong technique to be solving this problem, and that I actually need to use something else (like a cache). Although having said that, one of the benefits sold to me of going event-sourced was that we wouldn't have to bother with caching as we can just pre-build all our aggregates to be read-optimised for the front end.

Solution

I can categorically state that event sourcing does not confer a benefit of not having to bother with caching, nor does it really have anything to do with pre-building aggregates to be read-optimized for the front end. Read-optimized data models is CQRS, which is not the same thing at all as event sourcing (neither requires the other, though they are a really good fit).

The read-side in a CQRS/ES system is generally not going to be event-sourced: the reason you're doing CQRS alongside event-sourcing is that event-sourced models are terrible for queries, so you're implementing a non-event-sourced data model (e.g. relational or NoSQL documents...) to allow for effective queries.

So don't feel like you need to use the same structures in the read-side as you'd use in the write-side. Event-sourcing with the same aggregates as the write-side is especially unlikely to be beneficial.

If you do use event-sourced aggregates in the read-side, note that these are going to be different from the aggregates in the write-side. For instance, the process of constructing a denormalized view of the appointments might be a fine fit for an event-sourced aggregate:

on an APPOINTMENT_CREATED event (which presumably includes a doctor ID and a patient ID) record that this process began to construct a denormalized view
a subscriber for that "began to construct..." event then replays events for the appropriate doctor and patient (any event store should support this: replaying the events for a specific aggregate is the canonical operation in event sourcing) and uses those events to build a "here's the doctor (resp. patient) information for this appointment" command to the process aggregate
the aggregate records the information found and might even record an "everything found" event
another subscriber is building the denormalized view to be saved for easy querying by appointment ID

Note that in this, the "denormalization process" aggregate is pure; the side effects happen in projections of its events.

Would I do this in practice? Probably not. I might use a relational DB and have readers do a join. I might just have the appointment created events kick off async queries to build a denormalized view. Still, this sort of thing is possible.