Search code examples
oracle-databaseperformanceindexingsyntaxcreation

Oracle Index creation syntax


I'm facing some performance problems while using SQL queries with aggregation functions, so I thought it would be a good idea to understand how index creation works exactly. I came across one thing I couldn't understand: what is the difference between these two creation directives?:

1.)CREATE INDEX FIELD1_INDEX ON TABLE1 (FIELD1) ONLINE TABLESPACE XXX;
   CREATE INDEX timeofrequest_INDEX ON TABLE1 (timeofrequest) ONLINE TABLESPACE XXX;

2.)CREATE INDEX COMBINED_INDEX ON TABLE1 (FIELD1, FIELD2) ONLINE TABLESPACE XXX;

The reason I'm asking is that I have a query like:

SELECT SUM(field1) FROM table1 WHERE timeofrequest BETWEEN 1 AND 2;

Table1 contains a lot of data, so this query needs over 20 sec to get ready. To avoid a long runtime, I have tried to go with approach 1, and create an index for each affected field. Indeed, it's still too slow. From analyzing the execution plan, I see that the timeofrequest index is used, but not the one for field1. Should it help if I create a combined one with a single statement? Does it makes any difference?


Solution

  • So your first two indices are different from your third index in that the first two use only a SINGLE column for the index. Your third index consists of two different columns for your index, but the third can STILL be utilized even if you supply only the first column listed in the index... this is known as a composite index. Oracle can really only utilize a SINGLE index within a single query statement, and the CBO will figure out the best option to use.

    In your first example, you have TWO indexes against individual columns. Your second example is a SINGLE index with two columns. Your first example will ONLY use one or the other, but not both for any given single select statement (CBO will decide what to use based on your query...).

    Example:

    SELECT SUM(field1) FROM table1 WHERE timeofrequest BETWEEN 1 AND 2;
    

    This will only attempt to use the index you have placed against timeofrequest. In fact it will attempt to do a range scan.

    Example for second index:

    SELECT SUM(field1) FROM table1 WHERE FIELD1 = 'x';
    

    Assuming you did not have an INDEX against only 'FIELD1', this would utilize the COMBINED_INDEX you have in place, and fetch all records per the first field of your index.

    If you had:

    SELECT SUM(field1) FROM table1 WHERE FIELD2 = 'y';
    

    This would NOT use the combined_index, as the index is built around first using FIELD1, then FIELD2.

    If you had:

    SELECT SUM(field1) FROM table1 WHERE FIELD1 = 'x' and FIELD2 = 'y';
    

    This would still utilize the COMBINED_INDEX, but here you provide a finer level of granularity. It would utilize the full index (field1 and field2...) to return your row(s) much faster. Why is this useful? Well, it's possible you could have an index on FIELD1, and simply searching on ONLY field1 would still yield many thousands (or tens of thousands, or millions...) of records, so being able to provide another column as part of the index will simply help target the records you need... which in turn provides higher selectivity.

    IMPORTANT note! Remember the CBO figures out the best approach. If you have a poor cardinality (research table cardinality...) on your table for that column, it's possible you might not even use the index at all, and a full table scan would be better according to the CBO. If you have a high cardinality and you're still not using the index, it's possible that you may need to analyze your table and the indices on it, then rerun your explain plan to see if you get new results. It's also possible you might have a low cardinality due to so much of the data in the table changing, and not recently having it analyzed. Analyzing your table/indices is a pretty important thing to do and can easily be overlooked. On multiple occasions I've found instances where I work that simply re-analyzing the table and/or indices made incredible improvements.

    Please review analyzing tables and use of DBMS_STATS.

    https://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_stats.htm

    Another solution too is that if you have massive amounts of data, and your "time of request" column really only has a small distinct set of values (maybe in your table of 50 million records there's only 20 distinct values..), then you might opt to set up a partitioned table and provide indices from there. It would vastly improve query performance times.