Search code examples
sqldatabase-designrdbms

SQL Create an Interim Table or Make One Long Table


I am importing product data from my suppliers, and adding it to an internal product database. The data first gets imported to a dedicated Supplier Product Table, in its raw, non-validated form. I want to keep the original data intact so that I can link back to it, but I also need to clean/validate the data to system conformity before adding it to the Internal Product Table. Note this is a many-to-one relationship, in that several suppliers' products may link to one internal product if those supplier entries end up being the same after validation.

The question is: where should I store the interim data during the "cleaning" process -- i.e. before all the data entries are modified to validated form and can create a new internal product? (Note: I want to keep the original supplier data intact, so I can't change the data directly.) It seems to me there are two main options: A) Create additional columns on the Supplier Product Table to store the validated entries while the data is getting cleaned up, or B) Use an Interim Table that will hold validated entries while it is getting cleaned, and then push it to the Internal Product Table once fully validated. Once it has been pushed, the Interim Table can be cleared.

Option A seems easier to setup upfront, but also seems to conflict with the rules of normalization, in that the validated data on the Supplier Product Table will end up storing duplicated information to that of the Internal Product Table.

Any thoughts on this would be greatly appreciated.


Solution

  • B) Use an Interim Table that will hold validated entries while it is getting cleaned,

    B. It's just a much more flexible and robust way to model the data for this process.

    For perspective, what you're doing is a kind of "Master Data Management", and an MDM solution, you would usually see these things separated out. For instance in SQL Server Master Data Services you have separate

    1. Staging Tables

    2. Master Data "work tables" where validation and matching happens

    3. Publishing Views to project the finished and validated rows.

    Matching the "same" product from multiple source systems is done in the "work tables", typically by introducing a "Golden Record", and linking the versions of the product from various source systems to the "Golden Record" and applying promotion and survivorship rules.