Search code examples
data-warehouseolapdenormalizationdimensional-modelingoltp

How deep to go when denormalising


I denormalising a OLTP database for use in a DWH. At the moment I am denormalising studygroups.

  • Each studygroup has a key pointing towards 1 project.
  • Each project has a key pointing towards 1 department.
  • Each department has a key pointing towards 1 university.
  • Each universityhas a key pointing to 1 city.

Now I know that you are supposed to denormalize the sh*t out your OLTP but in this dwh department will be a dimension on its own. This goes for university also. Would it suffise to add a key from studygroup pointing at department or is it wiser to denormalize as far as you can and add all attributes from the department and all attributes from its M:1 related tables to the dimension studygroup? Even when department and university will be dimensions by themselves?

In other words: how far/deep do you go when denormalizing?


Solution

  • The key concept behind a dimensional model is:

    • Keep your fact tables in 3NF (third normal form);
    • De-normalize your dimensions into 2NF (second normal form)

    So ideally, the only joins you should have in your model are the joins between fact tables and relevant dimensions.

    As part of this philosophy:

    • Avoid "snow flake" designs, where dimensions contain keys to other dimensions. It's always possible to come up with a data model that allows the same functionality as the snow flakes, without violating 3NF/2NF rule;
    • Never have any direct joins between 2 separate dimensions (i.e, department and study group) directly. All relations among dimensions must be resolved via fact tables;
    • Never have any direct joins between 2 separate fact tables. Any relations among fact tables must be resolved via shared dimensions.

    Finally, consider that dimensional design, besides optimization of the data for querying, serves a second important purpose: it's a semantic model of the business (or whatever else it represents). So, when making decisions about combining data elements into dimensions and facts, consider their "logical affinity" - they should make intuitive sense to the end users. If you have hard times explaining to a BI analyst the meaning of your dimension or fact table, most likely you've made a modeling mistake.

    For example, in your case you should consider logical relations between universities, departments, study groups, etc. It's very likely that University/Department form a natural hierarchy. If so, they should belong to the same dimension. Study group, on the other hand, might not - let's assume, it's possible to form study groups across multiple universities and/or multiple departments. Such Many:Many relations are clear indication that they should be resolved via fact tables. In addition, relations between universities and departments are stable (rarely change), while study groups are formed and dissolved very often, and thus should be modeled separately.

    In general, if you see 1:1 or 1:M relations between dimensional elements, it's often an indication that they should be de-normalized into the same table (again, only if their combination makes logical sense). If the relations are M:M, most likely they belong to different tables (you can force them into the same table, but often such tables look like Frankenstein creatures).

    You can get much better help by making your question more specific - draw your dimensional model, post it, and ask for specific issues/challenges you have. For general concepts, books from Kimball and Inmon are your best friends.