domain-driven-design event-sourcing axon ddd-repositories ddd-service

Best Practices for Managing Unbounded Collections within Domain-Driven Design Aggregates

I'm designing an e-wallet system using Domain-Driven Design (DDD), and I'm facing a challenge regarding the design of my aggregates. Specifically, I have an aggregate representing an e-wallet, which contains a collection of transactions. The transactions list could potentially grow indefinitely over time, and I'm unsure about the best way to handle this unbounded collection within the aggregate.

Here's a simplified example of my current aggregate structure:

public class Wallet {
    private String walletId;
    private List<Transaction> transactions;
}

public class Transaction {
    private String transactionId;
    private UsdAmount amount;
}

Given that the transactions list could become quite large, I'm concerned about potential performance and scalability issues. I'm also unsure about how to maintain the integrity of the aggregate while dealing with this unbounded collection.

Note: I cannot treat transactions as a standalone aggregate because they lack significant business logic or behavior.

In my attempt to address this challenge, I experimented with the idea of creating a domain service specifically for managing read-only operations on the Transaction entities. This domain service would encapsulate operations such as querying, filtering, and retrieving transactions associated with a wallet. By passing this domain service to any aggregate behavior that needs to work with the transactions list, I aimed to simulate lazy loading and minimize the complexity within the Wallet aggregate itself.

While the approach of using a domain service for read-only transaction operations seemed promising, I'm uncertain about its long-term implications and whether it aligns with best practices in DDD. I'm particularly concerned about potential drawbacks such as increased coupling between the domain service and the Wallet aggregate.

I'm seeking feedback and insights from the community on whether this approach is considered a good practice in DDD, and if there are alternative strategies for managing unbounded collections within aggregates more effectively. Any guidance or suggestions would be greatly appreciated.

Solution

Given that the transactions list could become quite large, I'm concerned about potential performance and scalability issues. I'm also unsure about how to maintain the integrity of the aggregate while dealing with this unbounded collection.

First a reminder: potential performance and scalability issues are not actual performance and scalability issues. It can make sense to "just" put out the MVP with the straight forward design and defer the problem until you have enough traffic to motivate the work (see Jeff Dean, 2009).

The mechanical solution that I often see in event sourced systems is snapshotting. In effect, you "project" the information you need to remember into document/report/snapshot, with metadata that describes at which point in the stream the report was generated, and then to process new messages you load that document, along with any subsequent events.

If you think about it for a bit, this is what you are already doing, in the sense that when processing all of your events, there is an implicit snapshot of what the aggregate looked like before any events appear. We're just taking that concept and (a) making it explicit, (b) making it persistable, and (c) introducing the option to start from some point other than the beginning of the stream.

(Note: it seems to be common that these snapshots are generated asynchronously - we don't need to lock and update the snapshot when locking and updating the event stream.)

The domain solution that I often see makes time explicit in the definition of the aggregate; instead of tracking every event in the history of forever, we break them out by financial quarter (for example), with processes in place to roll information from one fiscal quarter to the next.

In other words, you replace the unbounded domain processes with a sequence of time boxed domain processes.