Search code examples
databasedata-warehousesnowflake-schema

Star snd snowflake schema in OLAP systems


I was of the impression that in OLAP , we try to store data in a denormalized fashion to reduce the number of joins and make query processing faster. Normalization that avoids data redundancy was more for OLTP systems.
But then again, 2 of the common modelling approaches (star and snowflake schema) are essentially normalized schemas.
Can you help me connect the dots?


Solution

  • Actually, that's very perceptive and the vast majority of people accept it. The truth is that a star is partially denormalized - the dimension tables are highly denormalized; they typically come from joining together a lot of related tables into one. A well designed fact table, however, is normalized - Each record is a bunch of values identified by a single, unique, primary key which is composed of the intersection of a set of foreign keys.

    Snowflake schemas are, as you surmised, even more normalized. They effectively take the dimension tables and break them into small values that are all joined together when needed. While there are constant arguments over whether this is better or worse than a star, many folks believe that these are inexpensive joins and, depending on your thinking, may be worth it.

    Initially, snowflakes were sold as a way of saving disk space because they do take up less room than dimension tables but disk space is rarely an issue nowadays.

    I personally prefer a hybrid approach that allows me to build a few levels of dimension table that ultimately can provide referential integrity to both my atomic level data but also to my aggregate fact tables.