Search code examples
sqldatabase-normalization

SQL normalization: Mathematical?


I'm taking a class in database design. The curriculum focuses heavily on nomalization. The steps and methods speak for themselves, however I find the intuition aspect frustrating. Is there a mathematical way to handle data normalization, where one can assign the properties and crunch a mathematically certain conclusion?


Solution

  • I'll give another real world example in a discussion regarding addresses.

    You are architect in charge of redesigning a legacy app with an old database that contains address information. The DB currently has 1 field for all address information.

    Our address looks something like this:

    Address
    ___________________________________
    Bob Robertson 123 Broad Way Springfield IL 62701
    

    Unacceptable! Let's normalize this a bit to remove some of that redundancy, so in our second pass, we break it up into a few fields:

    Address1        Address2        City          State   Zip
    ________        ________        ____          _____   ____
    Bob Robertson   123 Broad Way   Springfield   IL      62701
    

    Well, you say, there are multiple addresses within a state, so we need to break that out into a lookup table, and we can save tons of space on that city name, and there is a one to many relationship between streetname and all of the numbers that exist on that street.

    We also need to represent the 9 digit zip codes, and one 5 digit zip code has multiple 4 digit suffixes, so that needs it's own table too in order to fully normalize this relationship, boy am I clever!

    Also, we might have multiple robertsons, and multiple Bobs that we ship things to, so we need a cross reference lookup table for that as well.

    At this point, our table may look like this:

    FirstNameID     LastNameID    StreetID  StreetNum     CityID    StateProvinceID   ZipID   4DigitSuffix
    __________      _________     ________  ___________   ______    _______________   _____   ___________
    3452            1257          45        234           990       32                123     1234
    

    If you presented this to most rational developers, requiring 6 or more joins just to get address information, you might get run out of town on a rail. In many cases, there is an optimal middle ground that satisfies business needs and also speed/ease of development.