Search code examples
database-designrelational

Modeling Relational Entities for Auditing Question States


This is something that made me doubt for a while so I thought it would be a good idea to post it here to find some insight, it's a relational database modeling problem/doubt

I have the following problem:

I have "questions" that must be in an specific "state", and all changes of state must be audited.

I've found two solutions to this, but I can't really see the difference between them, if there's any... what do you think.

Here's the image with both diagrams.

EDIT:

Option A: table "questions" should not contain state_id, and Question_State should not contain "id" field. Sorry for the mistakes.

EDIT2:

Thanks for all the real world examples and insight, but this was an academic problem, not real-world related :).

Diagrams


Solution

  • I think the gist of what you're asking is: Should the state of the question be based on an intermediary table between Questions and State that has a time component (A) or should the table be more static, but with a log-oriented history table on the side (B).

    (Note: If you wanted to do a pure version of (A), then Boofus is right, you probably wouldn't put the state_id in the Questions table as well, as it's redundant; but that would definitely be inconvenient because it would make simple queries to get questions in a particular state much harder. So you've got a hybrid version here.)

    In general, if the requirement of keeping historical information about the state is really just for audit purposes - that is, if it won't be regularly queried by the application itself - you're probably better off going with option B, because it's a little simpler (there's really just the one "Questions" table, with a reference table for the states, and a "log" table for previous states). I think that shows your intent a little better.

    However, if the application semantics are more complex (e.g., if you have queries like "show all questions that have been in state X within the last 24 hours ..."), then an approach like (A) might make more sense. It's essentially making the state of a question into a time-dependent fact. If you do that, just be aware that it complicates things - either all your queries are harder and have to consider time, or you have to worry about keeping the state_id on Questions in sync with the most recent state in the Questions table. If you go that route, maybe call it "current_state" or something on Questions, so it's clear that it's sort of derivative information.