Search code examples
databasedatabase-designnormalizingdatabase-relations

Conflicting desires in Database Design, with fields of two similar functions


Okay, so I'm making a table right now for "Box Items".

Now, a Box Item, depending on what it's being used for/the status of the item, may end up being related to a "Shipping" box or a "Returns" box.

A Box Item may be defective:if it is, a flag will be set in the Box Item's row (IsDefective), and the Box Item will be put in a "Returns" box (with other items to be returned to that vendor). Otherwise, the Box Item will eventually be put into a "Shipping" box (with other items to be shipped). (Note that Shipping and Returns boxes have their own tables: there's not one common table for all boxes... though maybe I should consider doing that if possible as a third possibility?)

Maybe I'm just not thinking clearly today, but I started questioning what should be done in this situation.

My gut tells me that I should have a separate field for each possible relation, even if only one of the relations can happen at any given time, which would make the schema for Box Items look like:

BoxItemID Description IsDefective ShippingBoxID ReturnBoxID etc...

This would make the relations clear, but it seems wasteful (since only one of the relations will be used at any time). So then I thought I could have just one field for the BoxID, and determine which BoxID it's referring to (a Shipping or a Returns Box ID) based on the IsDefective field:

BoxItemID Description IsDefective BoxID etc...

This seems less wasteful, but doesn't sit right with me. The relation isn't obvious.

So, I put it to you, database gurus of Stackoverflow. What would you do in this situation?

EDIT: Thank you everyone for your input! It's given me a lot to think about. For one, I'm going to use an ORM next time I start a project like this. =) For two, since I'm not right now, I'll bite the four bytes and use two fields.

Thanks everyone again!


Solution

  • I'm with Psychotic Venom and mattlant.

    Going the polymorphic route (having to figure out which table your foreign key points to based on the contents of another field) is going to be a pain. Coding the constraints for that maybe tough (I'm not sure most databases would support that natively, I think you'd have to use a trigger).

    Do items ever move between the tables? Sticking with two tables with identical definitions where one is for returns and one is for shipping may be the easiest route. If you want to stick with the definition you first proposed (with the two separate fields) is perfectly reasonable.

    "Premature optimization is the root of all evil" and all that. While it seems wasteful, remember what you're storing. Since they are IDs they are probably just integers, maybe 4 bytes. Wasting four bytes per record is basically nothing. In fact, due to padding to put things on even addresses or other such things it may be "free" to put that extra field in there. It all depends on the DB design.

    Unless you have a very good reason to go the polymorphic route (like you're on an embedded system with little memory or you have to replicate across some really slow 9600bps link) it probably won't be worth the headaches you can end up with. Having to write all those special cases into your queries can get annoying.

    Quick example: doing a join between two tables where if you want to join is based on if the isDefective flag is set is going to be a pain. Being able to just use one of the two columns alone is probably enough of a hassle you may save, at least for me.