database-design relational-database database-normalization

How should I normalize this database design?

I'm trying to design a database. I've got a design that seems to me to be normalized to Fourth Normal Form - but still I think is broken, and I can't for the life of me work out how to fix it.

Background: we have four types of tests, and a couple dozen tests of each type. We run the tests in batches, where each batch contains tests of only one type. So a test result belongs to a batch, and also belongs to a test. Which gives a database plan something like this:

The problem is that this design allows for a Result that is for a Test of Type A, but that Result is in a Batch for Type B.

One thing I cannot do is combine the Test and Batch tables in to one table. There is a new Batch each week, while a Test lasts for months or years. And a Batch can contain many Tests (though always of the same Type), and a Test is usually done many times in many Batches.

I could insert a many-to-many join between Test and Batch, but I can't immediately see how that would help anything.

Is there a clean way to re-organise this so that we don't have the circular join path? Is this necessary? Or desirable?

Or should I just go with what I've got, and stop worrying about it? :-)

[Edit 1] Note that the Test has details on how it runs, who fixes problems found, etc which remain constant over multiple Batches, so the Test must exist independantly of any Batches that it may (or may not) run in.

[Edit 2] It has been pointed out that it would be better to have a TestBatch table, which gives us this structure:

I agree this is a good idea, but that doesn't actually fix the problem. It just moves the problem from Result to TestBatch. We can now have a TestBatch that is for a Test of Type A, but that TestBatch has in a Batch for Type B.

[Edit 3] Thanks to @philip-kelley 's excellent suggestion, I believe we have an answer. First, we link TestBatch directly back to Type, thus:

This doesn't immediately fix the problem. In fact it makes it worse - there could now be one Type for the Test, a different Type for the Batch, and a third Type joined directly from the TestBatch.

But the second step is to change the foreign key from TestBatch to Test, so that it includes the Type as well as the TestID. And to change the foreign key to Batch to include the Type as well as the BatchID.

That way, we can be certain that the TestBatch has the same Type as the Test, and the Batch.

Solution

@HLGEM’s answer describes the logical model, with some physical model details. A physical implementation that supports and enforces your business rules would look something like this. (This is psuedo-code, showing only the key columns--you'd want to add columns for attributes like Name, Score, etc. Actual implementation details are system-dependent and can get a bit tricky, but any RDBMS should be able to support this. Note that all columns listed are NOT NULLable.)

CREATE TABLE TestType
  TestType    int
  <primary key on TestType>


CREATE TABLE Test
  TestId       int
  TestType     int
  <primary key on TestId>
  <foreign key into TestType on column TestType>


CREATE TABLE Batch
  BatchId      int
  TestType     int
  <primary key on BatchId>
  <foreign key into TestType on column TestType>


CREATE TABLE TestInBatch
  TestInBatchId  int
  TestId         int
  BatchId        int
  TestType       int
  <primary key on TestInBatchId>
  <unique constraint on TestId, BatchId>
  <foreign key on (TestId, TestType) into Test, columns (TestId, TestType)>
  <foreign key on (BatchId, TestType) into Batch, columns (BatchId, TestType)>


CREATE TABLE Result
  ResultId       int
  TestInBatchId  int
  <primary key on ResultId>
  <foreign key into TestInBatch on column TestInBatchId>