java spring domain-driven-design spring-data-jdbc

Spring Data JDBC performance when load the whole aggregate

I have an aggregate root entity called School. The School entity encompasses various aggregate entities such as staff, courses, schedule flows, documents, etc., resulting in more than 20 attributes. Most of these attributes exhibit a one-to-many (1-M) relationship, and some even have their own aggregate roots.

My question pertains to the process of adding, modifying, or deleting an aggregate. Currently, I load the entire aggregate tree, perform the necessary modifications, and then save the changes. However, considering the substantial amount of data involved, I am concerned about the potential expense of this operation.

Would it be more prudent to directly perform modifications in the database tables using their respective repositories? For instance, instead of loading the entire school to update a course, I could use the course repository to update the course by its ID.

I am interested in understanding the performance implications of the first approach and the associated pros and cons.

What are the steps that we should take to avoid any performance issues in the future when we have lots of data.

Solution

Having a large aggregate might be a sign of bad aggregate design. Aggregate has a meaning. It defines the consistency boundary for operations on several entities that must be consistent (all elements of the aggregates are coherent with each other). If it gets larger (in an extreme way one aggregate for your whole system!), it means you are trying to keep consistency between all entities together. Sure you are keeping the consistency of the whole, but at the cost of loading whole system data. To define an optimum boundary around entities:

You should ask yourself what are the entities that need to be consistent together? For example, What are the invariants that "school" enforces to keep the whole aggregate consistent throughout your operations? If any of the entities of the "school" doesn't care about other entities or the "school", you must take it out of the aggregate and define another aggregate around it. For example, the operations on the "course" entity might not need to check invariants related to the "school". So it might be in another aggregate.
You should make it smaller to load smaller data. But not too small, because it will suffer from consistency issues. In that case, you have to consider eventual consistency and patterns like SAGA to make it consistent (a more complex solution).

Some other notes to keep in mind:

Don't prematurely optimize your solution. Design your aggregates to be solid and clean. After that, optimize your solution if you hit performance issues in production.
To design your domain model consider the trade-off between these three concepts:

Domain model completeness
Domain model purity
Performance

Here is an excellent article by Vladmir Khorikov's about these concepts.