repository domain-driven-design persistence datamapper ddd-repositories

Is it okay to have more than one repository for an aggregate in DDD?

I've read this question about something similar but it didn't quite solve my problem.

I have an application where I'm required to use data from an API. Problem is there are performance and technical limitations to doing this. The performance limitations are obvious. The technical limitations lie in the fact that the API does not support some of the more granular queries I need to make.

I decided to use MySQL as a queryable cache.

Since the data I needed to retrieve from the API did not change very often, I settled on refreshing the cache once a day, so I didn't need any complicated mapper that checked if we had the data in the cache and if not fell back to the API. That was my first design, but I realized that wasn't very practical when the API couldn't support most of the queries I needed to make anyway.

Now I have a set of two mappers for every aggregate. One for MySQL and one for the API.

My problem is now how I hide the complexities of persistence from the domain, and the fact that it seems that I need multiple repositories.

Ideally I would have an interface that both mappers adhered to, but as previously disclosed that's not possible.

Is it okay to have multiple repositories, one for each mapper?

Solution

Is it okay to have more than one repository for an aggregate in DDD?

Short answer: yes.

Longer answer: you won't find any suggestion of multiple repository in the original book by Evans. As he described things, the domain model would have one representation of the aggregate, and the repository abstraction provided consumers with the illusion that the aggregate was stored in an in-memory collection.

Largely, this makes sense -- you are trying to ensure that writes to data within the aggregate boundary are consistent, so you need a single authority for change.

But... there's no particular reason that reads need to travel through the same code path as writes. Welcome to the world of cqrs. What that gives you immediately is the idea that the in memory representation for reads might need to be optimized differently from the in memory representation used for writes.

In its more general form, you get the idea that the concept that you are modeling might have different representations for each use case.

For your case, where it is sometimes appropriate to read from the RDBMS, sometimes from the API, sometimes both, this isn't quite an exact match -- the repository interface hides the implementation details from the consumer, but you still have to bother with the implementation.

One thing you might look at is your requirements; how fresh does the data need to be in each use case? A constraint that is often relaxed in the CQRS pattern is the idea that the effects of writes are immediately available for reading. The important question to ask would be, if the data hasn't been cached yet, can you simply report "data not available" without hitting the API?

If so, then use cases that access the cached data need only a single repository implementation.