Search code examples
oracle11gdatabase-performance

How to pair low/high cardinality columns as composite indexes?


This question concerns to Oracle DB, so if there are general answers I would like to know. As I am discarding information from Derby/MySQL and other DBs regarding this subject.

Let's say I have several queries using the following columns on its WHERE clause:

Column | Cardinality | Selectivity
_______|__________________________
A      | low         | low
B      | high        | low
C      | low         | low
D      | high        | high
E      | low         | low
F      | low         | low
-- Queries
SELECT * FROM T WHERE A=? AND B=? 
SELECT * FROM T WHERE A=? AND B=? AND C=?
SELECT * FROM T WHERE A=? AND C=?
SELECT * FROM T WHERE A=? AND C=? AND D=?
SELECT * FROM T WHERE A=? AND E=? AND F=?
  1. Is there any benefit from pairing these columns (taking into account cardinality mixing) as composite indexes? If so, what is the logic to follow? I have understood this explanation but it is for SQL Server and it may behave differently.

  2. Is it worthwhile to do covering indexes instead of individual small composite indexes?

  3. Does it matter the column order of composite indexes? i.e:

-- Regardless the column order on the table creation.
CREATE INDEX NDX_1 ON T (A, C);
-- Versus:
CREATE INDEX NDX_1 ON T (C, A);
  1. Would this index be useful?
CREATE INDEX NDX_2 ON T(E, F); -- (low + low) Ignoring 'A' column.

Solution

  • A few things and bear in mind these are generalities

    • Generally you can only use the leading parts of an index. So looking at your examples

      If you have an index on ( A, B, C ) and you have a predicate on A and C, then only the index on A can be used. Now there are some cases where the non-leading part of an index can be used; you will see this in an execution plan as a SKIP-SCAN operation, but they are often sub-optimal. So you may want to have (A, C) and ( C, A )

    • A covering index can be useful, if you are not projecting columns other than those in the index.

    • Again generally, you do not usually want or need an index if the column has low selectivity. However, it's possible that you have two columns that individually have low selectivity, but have high selectivity when used in combination. (In fact, this is the premise of a bitmap index / star transformation in a dimensional model).

    • If a multi-column index is useful you may want to put the column with the lowest selectivity first and enable index compression. Index compression can save a huge amount of space in some cases and has very little CPU overhead.

    Finally, a SQL Monitor report will help you optimizing a sql statement when it comes to running it.