Search code examples
databasedatabase-normalization

What is the remedy of lossy-join decomposition?


I know what is lossless decomposition and lossy decomposition. In lossy decomposition we get some wrong informations. But what is the remedy? I think if we join on primary key it will be not lossy?

Am I right? If I am will it be lossless also?


Solution

  • To answer your question: to achieve a lossless decomposition you have to split on the functional dependencies. It seems that you already know the (formal) theory behind it, so I'll try to give you a glimpse of why that is true.

    Say you have a relation R

    R(author_id, author_name, post_id, post_date, post_title)
    

    with FDs

    author_id -> { author_name }
    post_id -> { post_date, post_title }
    

    Now, a good decomposition would be

    (author_id, author_name, post_id) (post_id, post_date, post_title)
    

    because post_id determines post_date and post_title, thus becoming a key. Since post_id is a key in r2, every row in r2 is a different row and we can join safely. Note that the primary key on the original relation was (author_id, post_id), which is more than we actually need.

    On the other hand, a bad decomposition would be

    (author_id, author_name, post_id, post_date) (post_date, post_title)
    

    because there is no FD post_date -> post_title, thus post_date is not a key in r2, thus post_date could have duplicate values. It should be obvious from a intuitive perspective that joining on a potential duplicate value will get us a ton of phanotom rows.