sorting amazon-redshift query-optimization

improving query performance for a table in redshift

I am new to Redshift and I worked with B-tree and bitmpa indexes in Oracle.

we have many tables in Star schema. Fact table with several fk columns to dimensiin tables. there us a dimension table amd we often use one column PART_NuMBER in where clause in Oracle. The is very high cardinality column almost all values are unique. We will do that in Redshift as well. so I suppose this column should be a good sort key. This table will be joined with fact table . So to use the most efficient merge join then the fk of the fact and pk of the dimension should be distkey and sort key. IfI define the pk as sort key then the column part_number will be not my leading sort key that can be used efficient as predicate in the where clause. In that case the table will be full scanned for one value or am I wrong?

what will be the best practice in such scenariis. The fact table has many other fk columns to another dimensions so the other could be defined in the compound key but will be used marge join when joining to another dimensions?

I am bit confused. In Oracle the bitmap could be defined for many columns and it doesn't matter the order of predicates in the where clause.

Solution

There seems to be a number of things to address in this post but let me cut to the likely answer.

I suspect that you need to define your dim tables as DISTSTYLE ALL and sort by part_number. There are some limits to doing this based on dim table size that depend on your cluster size / node type. Unless your dim tables are a significant percentage of you fact table in size you will be best off going this way.

The order of your WHERE predicates doesn’t matter but the order of your compound sort keys on the table does. I suspect you got these factors mixed. This is all about getting the most value out of the zone maps of each table.

As to your fact table you will want a set of sort keys that give you good rejection of blocks based on metadata. This is often not the fks to the dims but can be. It is common that a date or timestamp will be important as a sort key on your facts but will depend you your query patterns. Getting this optimized with take a bit of back and forth on your data on queries.