Search code examples
architecturetimerdbmsversions

Design debate: what are good ways to store and manipulate versioned objects?


I am intentionally leaving this quite vague at first. I'm looking for discussion and what issues are important more than I'm looking for hard answers.

I'm in the middle of designing an app that does something like portfolio management. The design I have so far is

  • Problem: a problem that needs to be solved
  • Solution: a proposed solution to one or more problems
  • Relationship: a relationship among two problems, two solutions, or a problem and a solution. Further broken down into:
    • Parent-child - some sort of categorization / tree hierarchy
    • Overlap - the degree to which two solutions or two problems really address the same concept
    • Addresses - the degree to which a problem addresses a solution

My question is about the temporal nature of these things. Problems crop up, then fade. Solutions have an expected resolution date, but that might be modified as they are developed. The degree of a relationship might change over time as problems and solutions evolve.

So, the question: what is the best design for versioning of these things so I can get both a current and an historical perspective of my portfolio?

Later: perhaps I should make this a more specific question, though @Eric Beard's answer is worth an up.

I've considered three database designs. I'll enough of each to show their drawbacks. My question is: which to pick, or can you think of something better?

1: Problems (and separately, Solutions) are self-referential in versioning.

table problems
  int id | string name | text description | datetime created_at | int previous_version_id

  foreign key previous_version_id -> problems.id

This is problematic because every time I want a new version, I have to duplicate the entire row, including that long description column.

2: Create a new Relationship type: Version.

table problems
  int id | string name | text description | datetime created_at

This simply moves the relationship from the Problems and Solutions tables into the Relationships table. Same duplication problem, but perhaps a little "cleaner" since I already have an abstract Relationship concept.

3: Use a more Subversion-like structure; move all Problem and Solution attributes into a separate table and version them.

table problems
  int id

table attributes
  int id | int thing_id | string thing_type | string name | string value | datetime created_at | int previous_version_id

  foreign key (thing_id, thing_type) -> problems.id or solutions.id
  foreign key previous_version_id -> attributes.id

This means that to load the current version of a Problem or Solution I have to fetch all versions of the attribute, sort them by date and then use the most current. That might not be terrible. What seems really bad to me is that I can't type-check these attributes in the database. That value column has to be free-text. I can make the name column a reference into a separate attribute_names table that has a type column, but that doesn't force the correct type in the attributes table.

later still: response to @Eric Beard's comments about multi-table foreign keys:

Alas, what I've described is simplistic: there are only two types of Things (Problems and Solutions). I actually have about 9 or 10 different types of Things, so I'd have 9 or 10 columns of foreign keys under your strategy. I wanted to use single-table inheritance, but the Things have so little in common that it would be extremely wasteful to do combine them into one table.


Solution

  • Hmm, sounds kind of like this site...

    As far as a database design would go, a versioning system kind of like SVN, where you never actually do any updates, just inserts (with a version number) when things change, might be what you need. This is called MVCC, Multi-Value Concurrency Control. A wiki is another good example of this.