How to validate from aggregate

I am trying to understand validation from the aggregate entity on the command side of the CQRS pattern when using eventsourcing.

Basically I would like to know what the best practice is in handling validation for: 1. Uniqueness of say a code. 2. The correctness/validation of an eternal aggregate's id.

My initial thoughts: I've thought about the constructor passing in a service but this seems wrong as the "Create" of the entity should be the values to assign.

I've thought about validation outside the aggregate, but this seem to put logic somewhere that I assume should be responsibility of the aggregate itself.

Can anyone give me some guidance here?

Solution

Uniqueness of say a code.

Ensuring uniqueness is a specific example of set validation. The problem with set validation is that, in effect, you perform the check by locking the entire set. If the entire set is included within a single "aggregate", that's easily done. But if the set spans aggregates, then it is kind of a mess.

A common solution for uniqueness is to manage it at the database level; RDBMS are really good at set operations, and are effectively serialized. Unfortunately, that locks you into a database solution with good set support -- you can't easily switch to a document database, or an event store.

Another approach that is sometimes appropriate is to have the single aggregate check for uniqueness against a cached copy of the available codes. That gives you more freedom to choose your storage solution, but it also opens up the possibility that a data race will introduce the duplication you are trying to avoid.

In some cases, you can encode the code uniqueness into the identifier for the aggregate. In effect, every identifier becomes a set of one.

Keep in mind Greg Young's question

What is the business impact of having a failure?

Knowing how expensive a failure is tells you a lot about how much you are permitted to spend to solve the problem.

The correctness/validation of an eternal aggregate's id.

This normally comes in two parts. The easier one is to validate the data against some agreed upon schema. If our agreement is that the identifier is going to be a URI, then I can validate that the data I receive does satisfy that constraint. Similarly, if the identifier is supposed to be a string representation of a UUID, I can test that the data I receive matches the validation rules described in RFC 4122.

But if you need to check that the identifier is in use somewhere else? Then you are going to have to ask.... The main question in this case is whether you need the answer to that right away, or if you can manage to check that asynchronously (for instance, by modeling "unverified identifiers" and "verified identifiers" separately).

And of course you once again get to reconcile all of the races inherent in distributed computing.

There is no magic.