DDD: should entities contain or reference other entities?

Let's say I have two entities: Company and Product.
They have the following attributes.

Company	Product
Id	Id
Name	Name
MarketCap	Category
Geography	Price

Here are a few additional facts:

A company has millions of products.
A product can't exist without a company.

Core endpoints I want to support are:

Get all products of a specific category (company information is not needed)
Get all companies (product information is not needed)
Get all products from a company
Get all companies for one product category

I could model the relationship of the entities like this:

Approach 1

// Company
type Company struct {
Id int
Name string
MarketCap int
Geography string
ProductIds []int
}

// Product
type Product struct {
Id int
Name int
Category string
Price int
CompanyId int
}

Or I could model it as following:

Approach 2

// Company
type Company struct {
Id int
Name string
MarketCap int
Geography string
Products []Product
}

// Product
type Product struct {
Id int
Name int
Category string
Price int
Company Company
}

As discusses here in the reddit post, approach 2 follows more the philosophy of DDD in a sense that the Domain model shouldn't care about storage implementation details. However, if I look at how I will use the data in my api endpoints I realised how inefficient some endpoints will become.

Let's assume I follow approach 2 and have an endpoint Get list of all companies. In order to recreate the company entity, I have to do a join between the company and the products table. Each company object will contain millions of products. Sure, I won't add all the products of each company in the final API response but I at least need to get all products from the database to create a valid company entity object. If I were to follow approach 1, I wouldn't have to do a join operation since the ProductIds are cheaper to obtain through a separate join table (companyID, productID).

Question

When should I model a one to many or many to many relationship with just ids and when should I model it with nested hierarchy on the domain layer?

Solution

Neither approach follow the DDD concept.

DDD is a design concept that states that you should create a single unit of code (called the domain) which contains all code related to modeling your business concepts, and business rules/constraints associated to them. The purpose of this code in the application is to detect and reject operations that would violate business constraints. The domain unit is only useful in state changing use cases: since read-only use cases cannot alter the system state, they cannot violate business constraints.

In your example, you only state read-only use cases. In this situation, you should not use the domain layer. When using the classic presentation-domain-persistence layered architecture, read-only use cases usually go for a "mediator" pattern, which is similar to a repository but with the interface in the presentation layer and the implementation in the persistence layer. You don't need to go through an intermediate model and thus you save a lot of computational power. It also allows usage of mapping libraries and easing the implementation of filtering, sorting, and paging results.

Also, the domain layer is expected to be persistence agnostic. A common error is to try to model your business model like you would a database. Especially you should not try to normalize a domain model, nor try to design a single unified domain model. Domain models should be split in reasonable-sized, independent chunks of business rules and use cases, called contexts: product management, company management, ... Each context should have its model which should be as small and simple as possible.

At the domain layer, whether a relationship should be modeled by id or by reference, depends on whether you have business rules constraint over the relationship. For instance, if products and companies are entities that you can associate without any constraint, you might want to use approach 1 as it is simpler. You might even drop the Company.porductIds. If you have a business rule that uses data across both entities, such as: sum of product.price <= company.marketCap, you will probably want to write something similar to approach 2. Actual implementation will vary depending on whether you want products and companies in the same context or not. If not, the model will be more complex and you will need to look into a more advanced topic called context cooperation.