Search code examples
sqlormdatabase-normalization

At what point does data normalization become ludicrous?


I often find myself questioning whether I'm taking the right approach in trying to plan for future expansibility when creating databases and relations.

I have the following situation:

  1. I have a Donor table and a Recipient table. Both tables share common information such as first_name, last_name, email_address, date_of_birth, etc. Both seem to, if you'll excuse my object-oriented language, share a common abstract type of Person. It's possible that someone who is at one point a Recipient may later become a Donor by means of giving a donation, so it's important that information isn't duplicated across tables. Should I opt for an inheritance pattern, or should I just foreign key Donors and Recipients to a Person table?

  2. Initially, I was thinking of simply mapping properties like email_address and street address properties directly into the things that need them, but then the possibility may arise that a person would have multiple email addresses or mailing addresses (ie: home, work, etc.). What that means is that we have a model somewhat like this:

    create table person(id int primary key auto increment, ..., 
        default_email_address);
    
    create table email_address(id int primary key auto increment, 
        email varchar(255), name varchar(255), is_default bool, person_id int);
    

    This makes things a bit complicated, as you can imagine. The name field also involves a list of default values as well as allowing custom input. I can't just make it an enum field, because the possibility exists that someone will have a lot of emails to add that could all be different... (this is the point at which I scream out "IS IT EVEN WORTH IT ANYMORE!?!?" and get frustrated with the project)

I guess what this really boils down to is the following: at what point does data normalization become ludicrous? My goal here is to create a really good as-forward-compatible-as-possible data model that I won't kick myself for creating later.


Solution

  • at what point does data normalization become ludicrous?

    At the point that it stops modelling the actual requirements.

    To take your examples:

    • With the Donor and Recipient tables, if it is highly likely that any one person will become both, then it does make sense to separate out to a Person entity. If this is rare, it doesn't.

    • With the email_address and street_address situations, it depends whether you do need to store multiples or not (what is the expectation?). You may want to store separate versions per business unit (say shipping_address vs billing_address).