Search code examples
sql-server-2008indexingforeign-keysdata-warehousestar-schema

Should I put a non-clustered index on these foreign keys in a fact table


Profile of the foreign keys

FK      Distinct Values      %
----    ---------------  ------
Id1     1                 0.1%
,Id2    4                 0.3%
,Id3    5                 0.3%
,Id4    6                 0.4%
,Id5    6                 0.4%
,Id6    95                6.1%
,Id7    97                6.2%
,Id8    1423             90.7%

All foreign keys already make up the clustered Primary Key. This fact table is part of a star schema that includes 6 dimensions (Id's 6,7, and 8 reference the same date dimension).

Fact table currently has approx 1800 rows (incredibly small), and is expected to grow by that amount each month.

Should each foreign key have its own non-clustered non-unique single column index for facilitating joins? If so, why?

Each foreign key will be part of a clustered index (primary key) in its dimension table.

If indexes should be put on the foreign keys, then what should the fill factor and padding index be set to given the low cardinality of the columns?


Solution

  • Your profile doesn't really make sense with the "%" column - why are you finding the "percentage" of distinct values across fields? You need stats on the distribution of the distinct values - are 99% of the keys on Id8 the same ? are they evenly distributed? etc.

    Note that everything I'm saying here applies to larger tables. With 1800 rows / month, indexes are probably a waste of space and time for you to worry about.

    @jrara's "rule" about indexing all the dims is an easy rule to apply, but you can easily make mistakes if that's all you do. I don't want to use an oracle bitmap index on my 100mil row customer dimension, for example.

    Indexing depends on what the queries look like against your data. Indexes won't help if you are doing a full scan of the fact table to perform aggregation and grouping for "summary" reports. They will help when a user is trying to filter on an attribute of a dimension, and that filter results in you only having to look up a small percentage of the records from the fact table. Is there a main entry point to your table? Do people typically filter on an attribute of the "Id8" dimension, then want grouping on an attribute from the other dimensions?

    Essentially the answers to your questions are:

    Should each foreign key have its own non-clustered non-unique single column index for facilitating joins?

    In general, yes, so long as the dimension tables are small and the dim keys are relatively evenly distributed in the fact table. Usually it is worse to use a index access to get 99% of the fact table rows.

    what should the fill factor and padding index be set to given the low cardinality of the columns?

    Lowering the FILLFACTOR below 100% will cause slower index reads, since there are more (empty) pages in the index for the DB to read. Since a data warehouse is designed for fast selects, I don't really ever recommend that you adjust the fillfactor down.

    That being said, in a few cases adjusting your FILLFACTOR may make sense. If the the fact table is very large (hundreds of GB / TB), and index rebuilds take hours, and you might only rebuild indexes once a month or even less. In these cases you need to figure out how much data (as a percentage) that you'll be adding to the table each day, and set the fillfactor accordingly.