Search code examples
mysqlsqldatabasenulldatabase-normalization

If I wanted to not use NULL, how would I normalize this table?


line_item (id, description, parent_id, product_id);
product (id, model);

Example Hierarchy of Line Items

Product A       //"parent" "line item"
    Option 1    //child "line item"s of "parent"
    Option 2    

Current DB Data

line_item:
id | description | parent_id | product_id
-----------------------------------------
 1 |   Product A |      NULL |         20  //has no parent "line item"
 2 |    Option 1 |         1 |         -1  //no product associated with option
 3 |    Option 2 |         1 |         -1  //has "line item" parent with id == 1
product
id | model
--------------
20 | Product A

Question

I am not exactly sure how to get rid of "NULL" from parent_id. Note that I use -1 in my product_id as well, where I use it in a similar way, saying "there is no "product" or "parent" associated with that particular line item record.

Do I need to get rid of it in the first place?


Solution

  • First of all, you should replace -1 with null. That is a completely legit value.

    Null-values actually don't play a role when considering normalization, because you can only speak of functional dependency if you have non-null-values, so you don't have to worry about normalization, your table is fine.

    Your table is an implementation of a "disjunct is-a"-relation, that means, that a line_item is either an "option"-line (with the value parent_id) or a "product"-line (with the value product_id and e.g. quantity, ...) and has the common values id and description. Disjunct means it can't be both at the same time, so the columns for the other type are set to null (that is why you "have to" replace -1 with null).

    The usual implementation of a "disjunct is-a" is to add a type-column, that defines which of these possibilities it is (the reason is more of a practical kind, e.g. for constraint-checks). You don't need it here, since it is clear to you what kind of line it is without this field, but I added it to emphasize what you are actually implementing and that you did everything in a standard way:

    line_item:
    id | description | line_type | parent_id | product_id
    -----------------------------------------------------
     1 |   Product A |   Product |      NULL |         20  
     2 |    Option 1 |    Option |         1 |       NULL  
     3 |    Option 2 |    Option |         1 |       NULL  
    

    Last remark: There are other possible implementations of "is-a", some of them are getting rid of the null (while introducing other problems), but for disjunct options, this is the usual one. But since it would answer your original question ("If I wanted to not use null..."), I'll add this one too:

    line_item:
    id | description 
    -----------------
     1 |   Product A 
     2 |    Option 1 
     3 |    Option 2 
    
    
    line_item_product:
    line_item_id | product_id
    ---------------------------
               1 |         20  
    
    line_item_option:
    line_item_id |  parent_id
    ---------------------------
               2 |          1 
               3 |          1 
    

    You have a table for the common columns (for both types option and product) and a seperate table for the two possibilities with their specific columns.

    This would get rid of your null.

    The biggest practical problem with this is, that you have to have a more complicated check for the "primary key" in the two tables line_item_option and line_item_product: You are not allowed to add line 1 in line_item_option when it already is in line_item_product, but mysql has no easy check for that. That's why this kind of splitting is only used when you have a "non-disjunct is-a"-relation (e.g. if the lines could be both product and option at the same time).