Search code examples
databasedatabase-designmany-to-manyone-to-many

What is the method to remember a "1 to many" relationship database change history?


I have basically two tables in the database, one is named "Valve", and the other is named "Channel". This is a basic "1 to many" relationship, which means one Valve can control many Channels. See the image below:

Valve and Channel

In this figure, you see four sub-figures. The first sub-figure (day0), it shows that Valve0(V0) is controlling 3 Channels(C0, C1 and C2), Valve1(V1) is controlling the Channel3(C3).

This is a very basic "1 to many" relationship, so I create two tables, one is named "ValveTable", and the other is named "ChannelTable". In the "ChannelTable", it has a foreign key named "ValveID" which points to the primary key of the "ValveTable".

But in my cases, the tree structure will change day by day. In the day1, the property of the C0 will changes, so do I need to make a new copy of C0 row in the "ChannelTable"? Because I need to remember the day0's tree hierarchy. Also, the C2 is connected to V1 in day1, so the connection is changed.

In the day2, C1 is removed from V0. And in day3, a new Channel C4 is added to the ChannelTable, so it got the new tree hierarchy.

I'm not sure how to design this whole database structure. Do I need to add a third table? just like a Conneciton Table or something like a "junction table" like a "many to many" relationship database?

Any good ideas? The whole picture is very large, and this is a very simplified question, the core issue is that I need to keep the tree structure change history in the database.

Thanks.

EDIT: Julio Di Egidio's answer give me an example of the "ChannelsPerDay" table, where the "DayNo" is a FK. But in-fact, in my figure, I wrote "day0, day1, day2 and day3" which is a simplification of the situation that the "Valve-Channel" relationship may be changed at some timestamp, for example, one channel may be deleted at the timestamp of "2023-11-24 10:31:21". So, I need find a way to actually use the timestamp as the field, not the "DayNo".


Solution

  • I need to keep the tree structure change history in the database.

    Then you need to store "channels per day".

    Namely, an immediate solution here is to add a field, say DayNo, to Channels that forms a composite PK with the ID, meaning that every entry in Channels represents uniquely "a channel in some specific day":

    Valves {
      ID  PK
    }
    
    Channels {
      ID       PK
      DayNo    PK
      ValveID     FK -> Valves.ID
    }
    

    One could then think of splitting Channels into Channels (just the list, including the "obsolete" ones), and -say- ChannelsPerDay, for a bit more normalization, but this adds extra complication that I would think is warranted only if Channels has to have more info about a channel than just an ID.

    In any case, for completeness, here is the corresponding structure, with Name fields to give the example more substance:

    Valves {
      ID  PK
      Name   UK  // assuming we want unique names
    }
    
    Channels {
      ID  PK
      Name   UK  // assuming we want unique names
    }
    
    ChannelsPerDay {
      ChannelID  PK FK1 -> Channels.ID
      DayNo      PK
      ValveID       FK2 -> Valves.ID
    }
    

    Notice in particular that that structure (but also the previous one) does support not recording "non-changes", namely there is no constraint that the table ChannelsPerDay is "complete" in some sense. Two considerations about that:

    1. should we want to enforce "completeness", as for relational constraints (as opposed to e.g. jobs that send emails...), I don't think we can do better than enforcing completeness table-level at the next insertion, i.e. where a record can be inserted if the data is "complete up to that point";

    2. OTOH, if it's fine to have those gaps (and, typically, I'd go this way), just the semantics changes, namely, it becomes a history of "updates", and we'd rather need adequate queries to reconstruct the "current state" (which of course we can do).

    Now, the second approach, a "history of updates", is the one you are interested in. It further splits into two possibilities, i.e. whether or not it is a requirement that channels can at periods be detached from all valves:

    • The structure I have given above assumes that a channel is always attached to some valve, there is in fact no way there to say that a channel gets detached from all valves (other than creating a dummy valve to which to attach "detached" channels, which is anyway not ideal). -- I have created a minimal playground, covering your Day0->Day1, up to a sample query to describe this case.

    • In case the requirement is that a channel can be at periods detached from all valves (as it seems to be in your case, looking at Day2 in the example), the structure has to change slightly: namely, the ValveID field in ChannelsPerDay (the "history" table) must become NULLable. And the (INNER) JOIN with Valves in the query in the playground can become an OUTER JOIN, depending on the specific output that is wanted. -- I have created another playground, covering from your Day0 to Day3, again up to a sample query to recover the state at some time.