Search code examples
sql-serverdatabase-designjoinnormalization

Perplexing Complexity - Tricky Table Join


I've been struggling with this specific set of tables for some time. There are three tables (used to be two, but it was necessary to add a third). The tables are Request, PartInfo, and Status. Request is used by the customer via a form that enters data into the table. Status is for our service agents to keep track of progress on customer requests. PartInfo is the new table, containing common data accessed by both parties.

The trick is that with each request, there is a running log of changes to that request which are stored in the same table, and linked to the original request in that series via a self-joining key called FirstRequestID (which I'll abbreviate as FID). The same is true for the Status table. Here is my basic table structure as I have currently designed it (Note: It's not too late to change the architecture if there is a better approach):

Request          PartInfo          Status
-------          --------          ------
ID               ID                ID
FID              FID               FID
PartInfoID       PartNum           PartInfoID
ProductID        Revision          StatusID
CategoryID       Description       Comments

Now say I want to display the information on a particular request (including part info and status changes) in a ASP.NET GridView table. The "particular request" is identified by the FID.

Question: How can I ensure that when I'm looking at either the Request history or the Status history, it's always pulling the proper information from the PartInfo (shared) table? In other words, what's the best way to link these three tables with the proper relationship without having 50 different junction tables to account for all the exceptions?


Solution

  • I apologize, but my first take on this schema was “this is a mess”. This data needs to be normalized. Unfortunately, there’s not enough information here to determine how to best do so. Based on your descriptions and part names, I come up with the following ideas.

    • The main entity is the Request.
    • Reqeusts contain Products
    • Requests contain Categories (unless the Category is an attribute of a Product)
    • Products contain Parts (unless it’s Categories that contain Parts)
    • The implication is that a Requested Product is associated with an arbitrary number of Parts (as opposed to a standardized set of Parts for that Product).
    • Status is used to track change in state of a Request over time (and is not completely dependant upon Products, Categories, or Parts)

    This suggests the following tables

    REQUESTS
    RequestId
    DateTimeCreated
    
    PRODUCTS
    ProductId
    --  Add CategoryId, if it’s a Product attribute
    
    CATEGORIES
    CategoryId
    
    REQUESTPRODUCTS
    RequestProductId
    RequestId
    ProductId
    DateTimeAdded
    --  Add StatusId if a status entry must be made every time a product is requested
    --  Note extra surrogate key. ReqeustId + ProductId + DateTimeAdded should be the
    --   natural key, unless two identical products can be requested at the same time
    --   (in which case add an “Quantity” column)
    
    
    REQUESTCATEGORIES
    RequestId
    CategoryId
    DateTimeAdded
    --  Suorrogate key optional, as it’s not referenced by other tables
    --  Drop, if categories are product attributes
    
    PARTS
    PartId
    
    REQUESTPRODUCTPARTS
    RequestProductId
    PartId
    --  Add StatusId if a status entry must be made every time a part is requested
    
    STATUS
    StatusId
    RequestId
    DateTimeAdded
    Comments
    

    There’s a log of ways this could go. You may end up with a lot of “junction” tables, but then your data will have referential integrity and accurate SQL queries become much, much simpler to write.