sql oracle-database indexing query-optimization

does querying an oracle table with a subset of index columns improve performance?

Say there is table that has 6 columns that make up a unique index.
If a query joins on only 3 of those columns will there be any index help on this query? Would creating a new nonunique index on just those 3 columns help even more? Thank you

Solution

The short answer is yes, your query that has predicates on three of the six indexed columns can benefit from the index. But how much it can benefit depends on the order of columns in that index, and where the three you need are in the list.

Starting from the left-hand size of the index's column list, your query will seek (perform a binary search) progressively on each indexed column that your predicate has an equality operator on, until it reaches an indexed column that you do not have an equality predicate on. At that point, any further columns to the right that you may need will have to be scanned by following the linked list of index leaf nodes, which means no more binary seeking (hence, slower).

How much this will impact you depends on how important those additional columns are to the right of the gap. If you have already achieved most of your row filtering, then maybe you don't care. But if there's a very selective column to the right of the gap, that'll hurt. It's especially important that your query have an equality predicate on the leading (left-most) column of the index, or Oracle will either not use the index at all, or will attempt an inefficient skip-scan. How bad that is depends on the # of distinct values of the column(s) it skips (the more, the worse).

So, imagine an employee table with a concatenated index on:

(LAST_NAME, FIRST_NAME, STATE, GENDER, DEPARTMENT, EMPLOYEE_ID)

And let's say this is your query:

SELECT * FROM employee WHERE last_name = 'Smith' and State = 'CA'

This will do a binary search on last_name but then will have to scan all the leaf nodes within that last name testing for the state filter. It cannot continue using a binary seek method for the state, because the unused first_name came between them in the index.

SELECT * FROM employee WHERE last_name = 'Smith' and first_name = 'John' AND gender = 'M'

This will seek on both last_name and first_name and then will scan through all the leaf blocks with these values applying the filter for gender. But that's probably okay, because by the time you have isolated last name and first name, gender isn't going to reduce your row count noticeably, especially if first names tend to already correlate to gender.

SELECT * FROM employee WHERE first_name = 'John'

This query will either not use the index at all, because you aren't applying a predicate to the leading/left-most column (last_name), but it could do a skip-scan, which would require separate binary seeks for first_name within every single last_name value. If you have 100 different John's, that's 100 seek operations instead of 1. Not great. But then, how often does a query like this run? How long does it take? If it's still fast or fast enough for how infrequently it runs, then maybe we don't care.

Imaging this one:

SELECT * FROM employee WHERE employee_id = 1234

That index will be of no help at all. While employee_id is indexed, it's not the leading column and not even close to the left-edge, so skip scanning isn't going to be helpful either. A very selective column like employee_id would need to be moved to the left edge of the index or given it's own index in order to facilitate queries that need it.

Lastly, let's say your predicates are on the leading three columns of the index:

SELECT * FROM employee WHERE last_name = 'Smith' and first_name = 'John' AND state = 'CA'

This will perform just as well with the existing index even though there are additional columns it doesn't use to the right, than it would a new index with only these three columns. Well, nearly just as well. If the predicates match many (i.e. many thousands) rows and you end up having to scan leaf blocks to read them all, the more index entries per block the fewer blocks you'd need to scan, so the fewer unneeded columns in the index the less I/O would be required because you can fit more entries per block. But usually this is of minimal importance, especially with caching in play. It pales in comparison with the importance of using the leading columns for selective predicates.

So in summary, exactly what works best for you depends on your data, its cardinalities, and the queries against it.