Search code examples
sqlrelational-databaseone-to-many

One to Many relationships db modeling : why most people choose a foreign key over a composite key?


I’ve got a question about why most people choose a deficient way of modeling a “One to Many” relationship in a relational database.

For simplicity’s sake, let’s consider a app :

  • In screen 1, display of a list of orders
  • In screen 2, display the 0…N items of a single order.

That’s pretty much THE basic use case for many “transactional” applications that process documents and workflows.

For DB modeling, there is (at least) two main approaches to get things done. I’ll use MariaDB in this example to illustrate.

Approach 1 – Independent DB entities only linked by a foreign key In this approach :

CREATE TABLE `order` (
  id_order INT,
  orderfield1 VARCHAR(255),
  PRIMARY KEY (id_order)
);
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| id_order    | int(11)      | NO   | PRI | NULL    |       |
| orderfield1 | varchar(255) | YES  |     | NULL    |       |
+-------------+--------------+------+-----+---------+-------+

and

CREATE TABLE `orderitem` (
  id_order INT,
  id_item INT, 
  orderitemfield1 VARCHAR(255),
  PRIMARY KEY (id_item),
  FOREIGN KEY (`id_order`) REFERENCES `order`(`id_order`) 
);
+-----------------+--------------+------+-----+---------+-------+
| Field           | Type         | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| id_order        | int(11)      | YES  | MUL | NULL    |       |
| id_item         | int(11)      | NO   | PRI | NULL    |       |
| orderitemfield1 | varchar(255) | YES  |     | NULL    |       |
+-----------------+--------------+------+-----+---------+-------+

Approach 2 – Composite key in the “owned” entity In this approach :

CREATE TABLE `order` (
  id_order INT,
  orderfield1 VARCHAR(255),
  PRIMARY KEY (id_order)
);
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| id_order    | int(11)      | NO   | PRI | NULL    |       |
| orderfield1 | varchar(255) | YES  |     | NULL    |       |
+-------------+--------------+------+-----+---------+-------+

and

CREATE TABLE `orderitem` (
  id_order INT,
  id_item INT,
  orderitemfield1 VARCHAR(255),
  PRIMARY KEY (id_order, id_item),
  FOREIGN KEY (`id_order`) REFERENCES `order`(`id_order`) 
);
+-----------------+--------------+------+-----+---------+-------+
| Field           | Type         | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| id_order        | int(11)      | NO   | PRI | NULL    |       |
| id_item         | int(11)      | NO   | PRI | NULL    |       |
| orderitemfield1 | varchar(255) | YES  |     | NULL    |       |
+-----------------+--------------+------+-----+---------+-------+

My point is : the actual business use case is the only parameter to decide whether we should go with approach 1 or approach 2. In my opinion, a “One to Many” (entity A – entity B) relationship can reflect two business requirements :

  • Entity B cannot exist without Entity A (example : order items of an order / shopping cart, comments about a post, etc.)
  • Entity B can exist without Entity A (example : 0..N users that have created/updated a single document, …)

I argue that when modeling a ‘Entity B cannot exist without Entity A’ one-to-many relationship, approach 2 is the best approach in most business use cases. Why ? Just imagine that orderitem table contain 1 billion records, and that the user is displaying a order document. In approach 1, to improve performance, an index will be created on the non-primary id_order column. So in addition to the 1 billion-long table, there will be another 1 billion long db entity for b-tree indexing. This will clutter the database storage infrastructure. In approach 1 too, an error in the application server can result in a record in the orderitem table that breaks the business functional rules.

In approach 2, the id_order of the composite primary key already serves as an index to get the relevant entity B records in log(n) time. There is not extra storage to add on the infrastructure.

However, when modeling a “Entity B can exist without Entity A” one-to-many relationships, approach 1 might be the best approach, but only if most queries run only over the entity B.

So my question : do you think my analysis is correct, and if so, why most DB developpers choose the foreign key approach over the composite key one, regardless of the business reality…?

Most ORM strongly encourage approach 1 (using foreign key), no matter what the functional requirement is / regardless of performance


Solution

  • OK, a lot to unpack here.

    So in addition to the 1 billion-long table, there will be another 1 billion long db entity for b-tree indexing

    Yes, in MySQL a secondary index would need to be created and maintained for this purpose. This is valid in InnoDB and (SQL Server as well), since they use the clustered-index model. To be honest, thought, a billion-entry index nowadays is not minor, but not huge either. I have a 900-million row table in DB2 with 4 indexes (2-6 columns each). No big deal.

    an error in the application server can result in a record in the orderitem table that breaks the business functional rules

    No, both solutions are safe in this regard. The application (any app) won't be able to insert data that do not comply with the foreign key constraint. No worries about this one. A bug in this regard would be considered a critical one, and would be fixed fast.

    do you think my analysis is correct...

    Yes, your analysis is correct, albeit biased towards optimization. I totally agree with you because my focus is also optimization. However, you need to be realistic and consideer that many developers don't really understand the basic principles of it (took me a couple of decades). Again, nothing wrong with that, because in most cases the engine can compensate for it (e.g. Oracle auto-indexing, DB2 auto-tuning, etc.)

    Most ORM strongly encourage approach 1 ...

    I've seen that most non-mature ORMs do encourage this approach, specially the ones based in JavaScript or TypeScript. Those are new ORMs that are not comprehensive in their implementation, and are actually buggy when composite keys are used. I think the recommendation is not theoretical, but just pragmatic, since they crash often with composite keys. I'm sure in time these bugs will be fixed.

    I hope I was able to add more details to your interesting question.