repository aggregate domain-driven-design entity

How to avoid creating too large DDD aggregates?

I have the root of the aggregate Film, which contains all the information about a movie and a list of objects of the Comments type (text, link to the user, and so on).

Since Film is the root, then, if necessary, I must receive absolutely all information about the film, including comments, although very often I do not need it at all. For example, if I want to get a list of all the films, then I absolutely do not need comments on each, especially all at once.

There was an idea to put comments in a separate aggregate, but comments cannot exist without the film they were written for, therefore they are part of the Film aggregate.

What to do in such cases? Is it possible to receive them in portions or separately from the Film unit according to DDD?

Solution

Remember that an aggregate is all about state consistency.

Thinking about the domain model in terms of operations (which for this purpose could encompass both commands and queries, since we're not necessarily doing CQRS...) is useful. If there exist two operations A and B such that we want operation B to always see state changes made in operation A, then that's a very strong signal that A and B should be on the same aggregate. Conversely, if there's an operation C for which no operation on a given aggregate requires that C see that operation's state changes or that operation see C's state changes, that's a sign that operation C shouldn't be an operation on that aggregate.

After assigning operations, the aggregates should only have the state necessary to support the operations: any other state in the aggregate is superfluous and should be removed.

In the context of your question, then, do you want a "get comments for film" operation to see the changes introduced by "update year of film" immediately?

Note that you can (within the limits of your infrastructure and application layers: it's possible that something (e.g. choice of framework) in those layers makes this impractical if not impossible) take two high-level-operations which have a "requires consistency" relationship and implement the operations in terms of a process which ensures their consistency in terms of operations against different aggregates. The saga pattern may prove useful for this. This adds complexity (because you'll typically end up introducing some concurrency-control operations into your domain model, e.g. locks or rollbacks), but if the coordination overhead of a too-large aggregate is causing the system to fail to meet non-functional requirements, well, sometimes "ya gotta do what ya gotta do".

For example, if the requirement is that a "post comment about film" operation must see the result of a "create film" or "delete film" operation but no other operations on film, you can have CommentAboutFilm be a separate aggregate from Film, as is a FilmLifecycle aggregate; all three operations go through the lifecycle aggregate which records the intention to create/delete a film (updating its state) before actually creating/deleting the film aggregate and then recording that the operation was performed; similarly, the post comment operation goes through the lifecycle aggregate: if the film creation operation hasn't yet succeeded or there's an intent to delete, the post comment operation is rejected.

Note that if the concern about aggregate size isn't around concurrency/consistency but loading, there are some technical/implementation tricks that are outside the scope of the domain model such as lazy loading. These tricks may require custom support in the infrastructure layer (e.g. if you rely on some sort of optimistic concurrency control from your infrastructure, you may need to implement logic there to allow a change strictly to the comments to a film to be concurrent with adding a cast member).