For the following query. ITEM_CTY_TAR_T
is a volumnious table with 10 million+ rows. I have created primary index on this table with columns ITEM_NO, ITEM_TYPE, GA_CODE_IMP, GA_TYPE_IMP, TAR_NO
and VALID_DATE_FROM
but on explaining the plan for this query, the optimizer did a FULL scan on the table. What index should be created to prevent full scan?
SELECT
t.item_no, t.item_type, t.BU_CODE_RU, t.tar_no,
VALID_DATE_FROM
FROM
(SELECT
tar.item_no, tar.item_type,
org.BU_CODE_RU, tar.tar_no, tar.VALID_DATE_FROM,
ROW_NUMBER() OVER (PARTITION BY tar.ITEM_NO, tar.ITEM_TYPE, org.BU_CODE_RU
ORDER BY VALID_DATE_FROM DESC, tar.tar_no DESC) AS RowNumber
FROM
(SELECT
item_no, item_type, tar_no, valid_date_from,
GA_CODE_IMP
FROM
ITEM_CTY_TAR_T tar
WHERE
GA_TYPE_IMP = 'CTY'
AND DATE '2023-08-06' >= TRUNC(tar.VALID_DATE_FROM)
AND DATE '2023-08-06' <= NVL(trunc(tar.VALID_DATE_TO), DATE'9999-12-31')
AND NVL(trunc(tar.DELETE_DATE), DATE '9999-12-31') > DATE '2023-08-06') tar,
(SELECT
BU_CODE_RU, GA_CODE_CTY
FROM
store_t
WHERE
BU_TYPE = 'SO' AND BU_CODE_RU IS NOT NULL) org
WHERE
org.GA_CODE_CTY = tar.GA_CODE_IMP) t
WHERE
rownumber = 1
You don't want indexes to be used for this query. Indexes are only faster than full table scans when the ratio of target rows to the table size is extremely small. That is not likely to be the case here.
You're looking for records active as of a given day, but in most type-2 tables that's a significant portion if not a majority of the table. Neither GA_TYPE_IMP = 'CTY'
nor BU_TYPE = 'SO'
are likely to narrow down your results enough to justify index use. A full table scan is optimal in this situation.
I suggest that instead of looking at indexes, you hint the query for a modest amount of parallelism:
SELECT /*+ parallel(4) */
t.item_no, t. . . .
Now, if your records undergo constant change whereby your typical row version lasts only for a day or so and you have thousands of versions per key such that asking for records active as of Aug 6 2023 will give you only a tiny percentage of the table (much less than 1%), and you typically only want a very recent date (like today or yesterday), then those from/to dates become important and that may push you to want to use indexes. If that's the case, then your best bet is a combined index but it won't perform well - it will start at the right place on the first column (VALID_DATE_FROM
) but will have to scan the remainder of the index to test the other column (VALID_DATE_TO
):
CREATE INDEX i_index ON ITEM_CTY_TAR_T (VALID_DATE_FROM,VALID_DATE_TO);
And to use it at all you'd have to remove your date functions on at least the leading column:
AND DATE '2023-08-06' + 1 > tar.VALID_DATE_FROM
AND DATE '2023-08-06' < NVL(tar.VALID_DATE_TO, DATE'9999-12-31')
Because your predicate on the first indexed column is an inequality, there's no hope of seeking on the second column so having a function there doesn't really hurt. But scanning the remainder of the index if you are looking at only recent dates might be cheaper than scanning the full table since the narrower index, especially only the tail end of it, will be smaller.
Index scanning however uses inefficient single block reads following a linked list of non-contiguous blocks, and each row found that matches the predicates is another inefficient single block read from the table segment. This can really add up. Whereas a full table scan uses much more efficient multiblock reads on a single pass and can also bypass the buffer cache and use parallelism, so is much more efficient at reading large volumes of data. Which one is better can only really be known by testing in your environment with your data.