javascript database postgresql prisma rdbms

Prisma Many to Many relationship with at least one value

I use Prisma and Postgresql for a project to store and manage recipes.

I am struggling to implement a Many to Many relationship where I can constrain one side to require at least one value for the relation field.

Recipes can have one or more aromas. Aromas can be included in zero or more recipes.

I use an explicit relation table because I need to store additional data in it (the quantity for each aroma in a recipe).

The Recipe, Aroma and relation models are the following:

model Recipe {
  id          Int              @id @default(autoincrement())
  name        String
  base        String?          @default("50\/50")
  description String?          @default("aucune description")
  rating      Int?             @default(1)
  aromas      RecipeToAromas[]
}

model Aroma {
  id      Int              @id @default(autoincrement())
  name    String
  brand   Brand?           @relation(fields: [brandId], references: [id])
  brandId Int?             @default(1)
  recipes RecipeToAromas[]

  @@unique([name, brandId], name: "aromaIdentifier")
}

model RecipeToAromas {
  id         Int    @id @default(autoincrement())
  recipeId   Int
  aromaId    Int
  quantityMl Int
  recipe     Recipe @relation(fields: [recipeId], references: [id])
  aroma      Aroma  @relation(fields: [aromaId], references: [id])
}

I want to constrain recipes to have at least one aroma.

By definition Many to Many defines zero to many relationship.

I thought about solving the problem with adding an additional One to Many relationship between Recipe and Aroma.

That would imply adding an additional aroma field in Recipe to store the one aroma that is required (and rename aromas field to additionalAromas to avoid confusion) :

model Recipe {
  id          Int              @id @default(autoincrement())
  name        String
  base        String?          @default("50\/50")
  description String?          @default("aucune description")
  rating      Int?             @default(1)
  aromas      RecipeToAromas[]
  aroma       Aroma            @relation(fields: [aromaId], references: [id])
  aromaId     Int
}

And adding a recipe field in Aroma as it required to establish the relation :

model Aroma {
  id      Int              @id @default(autoincrement())
  name    String
  brand   Brand?           @relation(fields: [brandId], references: [id])
  brandId Int?             @default(1)
  recipes RecipeToAromas[]
  recipe  Recipe[]

  @@unique([name, brandId], name: "aromaIdentifier")
}

But that feels wrong as I will have duplicates : recipes and recipe fields in Aroma would store identical data.

** Edit ** I tried to solve the problem using this solution, it creats a second problem : Each aroma in a recipe has to be unique in this recipe (this is reflected by the compound @unique in the relational database).

If I add the One to Many relationship between Recipe and Aroma, then an aroma can be stored more than once in a recipe :

await prisma.recipe.create({
    data: {
      name: "First recipe",
      aromaId: 1,
      aromas: {
        create: [
          { aromaId: 1, quantityMl: 2 },
          { aromaId: 2, quantityMl: 2 },
          { aromaId: 3, quantityMl: 2 },
        ],
      },
    },
  });

I could of course workaround the problem by just relying on validation in mutation functions and user input. And probably try to add a layer of safety with types as I am using typescript. But I feel like it would make the database brittle and is prone to error especially if I have to collaborate with other devs, or even use the database in a different projet.

I could not find any resource covering a similar situation, and of course I have spend a lot of time searching and re-reading the documentation.

I am new to prisma (started yesterday) and I dont have too much experience with RDBMS, so it feels like I am missing something.

Solution

So your problem can be solved in 2 ways (in raw SQL), one better than the other. Also I'm going to use PostgreSQL syntax as I'm not familiar with prisma but I'm sure this can be translated to the ORM model at best or inserted as raw SQL statements in postgres at worst

Using Triggers

Triggers are associated with specific tables and will be run when a specific action is done in that table, for our case we would want to run this after the Recipe gets a new element. The syntax would be as follows (also feel free to read the documentation here as this is quite a complex topic that can't be summarized in a few sentences)

CREATE OR REPLACE FUNCTION consistency_check_recipe() RETURNS TRIGGER AS $$
  BEGIN
    IF NOT EXISTS (SELECT * FROM RecipeToAromas WHERE recipeId = NEW.id)
      THEN RETURN NULL;
      ELSE RETURN NEW;
    END IF;
  END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE TRIGGER recipe_aroma_minum_check
BEFORE INSERT ON Recipe
FOR EACH ROW EXECUTE FUNCTION consistency_check_recipe();

To understand the syntax above, you have to understand how triggers in Postgres work. Triggers come in many different forms, and triggers can be executed before or after a statement is applied to the database (even if not committed) or even before a transaction is set to commit

The trigger above is set to be invoked when a new row is inserted into the Recipe table, this is the reason why triggers are said to be associated with a specific table, our trigger is set to be called right before each tuple insert ( as apposed to a statement level trigger that is mostly used for auditing ) and before any other checks are done. And from there the trigger will execute the function we made. The important things to note in the function is the fact that it is defined to return a TRIGGER. This a requirement for postgres and even functions that are not to return anything other than void ought to have this in their definition.

Another thing to note is that the function has access to values from the tuple that the trigger caught, this is provided for us through the NEW construct for INSERT and UPDATE triggers and OLD construct for UPDATE and DELETE. In our example this construct provides us with the new tuple that is to be inserted, it is structured exactly as a table, so we can check the ID that is set to inserted and see if it referenced in the RecipeToAromas relation.

If it is not referenced we can return Null to signify to the trigger that that row is not to be inserted, or otherwise we can send a new row that will be inserted in instead, or if we want to insert the original row, we can just send that through the NEW construct, or we could even modify the NEW construct and send that to be inserted.

This example works very well in preventing Insertions of rows not in the RecipeToAromas relation but it presents another problem. If we want to add a Recipe we have to have it be present in at least one RecipeToAromas tuple. This creates a chicken and a egg problem. Since one requires the other one to be present, and vice versa.

To solve this we can look at other types of triggers, notably ones that execute after the INSERT statement and after all other non trigger constraint's have been applied. However this still doesn't offer much help as the trigger is processed after the statement not after the transaction.

To solve this issue we should look at CONSTRAINT Triggers, these triggers are almost the same but they offer the option to defer the execution of them up to the end of the transaction.

Here would be the syntax for them

CREATE OR REPLACE FUNCTION consistency_check_recipe() RETURNS TRIGGER AS $$
  BEGIN
    IF NOT EXISTS (SELECT * FROM RecipeToAromas WHERE recipeId = NEW.ID) THEN
       RAISE EXCEPTION 'Must have at least 1 Aroma';
    END IF;
    RETURN NULL;
  END;
$$ LANGUAGE plpgsql;

CREATE CONSTRAINT TRIGGER recipe_aroma_check
AFTER INSERT OR UPDATE ON Recipe
INITIALLY DEFERRED
FOR EACH ROW EXECUTE FUNCTION consistency_check_recipe();

They are specified using the CONSTRAINT keyword, we also further specify the deference (aka when the function will be executed) with the INITIALLY DEFERRED statement (this is not the only possible value and not the only way to do it, check the docs there is a loooot more) In our case this means that the function will only be executed at the end of the transaction block. Allowing us to do this:

BEGIN;

INSERT INTO Recipe
VALUES (1, 'Borsh', 'Tasty Water', 'Food Nothing more', 100);

INSERT INTO RecipeToAromas
VALUES (DEFAULT, 1, 3, 4);

COMMIT;

We first insert a new Recipe value but because the trigger is only executed at the end of the commit we add that to the database and we also add a link through RecipeToAromas, and only then is the function trigger executed, at which point we see that Recipe is indeed referenced in RecipeToAromas.

A few things to note:

Since this a after the database has modified the rows, nothing we returns matter ( but we still have to return stuff ) so you can just return Null at all times
Since this is inside of a transaction we can just raise an exception to terminate all the changes we made, if you wish to correct them you can still execute DML statements.
NEW is still available.

Things to keep in mind, updates/deletes on the RecipeToAromas or to the Recipe relations may break the data consistency so it is important to keep that in mind and design other triggers/your database around that.
This is not a cheap operation and can harm performance, the usual query optimizations techniques can be used as well as some others specified by the docs.
This is a pretty big topic, so if you intend to do something with triggers, please read the docs.
Functions can be reused for multiple triggers

I might edit this answer further tomorrow (maybe)