I am using Oracle 10g. Here is my query
select * from Entries
where RefKey in (select RefKey
from Entries
where KeyStat = 1)
and RefKey = Key;
Here RefKey, Key and KeyStat all are indexed. The table is partitioned on another column which is not being used here. In this query, i am selecting master key (if RefKey = Key then master) currently active (KeyStat = 1). Here is the execution plan of this query using SQLTools 1.21 RC3.
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 270 | 218K (1)| 00:43:37 | | |
| 1 | NESTED LOOPS SEMI | | 1 | 270 | 218K (1)| 00:43:37 | | |
| 2 | PARTITION RANGE ALL | | 1 | 262 | 218K (1)| 00:43:37 | 1 | 12 |
|* 3 | TABLE ACCESS FULL | ENTRIES | 1 | 262 | 218K (1)| 00:43:37 | 1 | 12 |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| ENTRIES | 10M| 77M| 3 (0)| 00:00:01 | ROWID | ROWID |
|* 5 | INDEX RANGE SCAN | IND_ENTR_REFKEY| 1 | | 2 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------
I am concerned about ID = 3 "TABLE ACCESS FULL". If all the columns being used in this query are indexed then why oracle is doing full table scan.
How this can be optimized? If i put some values in inner query, it returns much faster.
To explain why the sub-query is necessary: I am selecting the whole batch having at least one active key. Refkey is not unique; for instance:
Key=1, RefKey=1, Stat=1
Key=2, RefKey=1, Stat=0
Key=3, RefKey=2, Stat=1
"I am concerned about ID = 3 "TABLE ACCESS FULL". If all the columns being used in this query are indexed then why oracle is doing full table scan."
The optimizer is ignoring the index on KEYSTAT. I would guess this is because KEYSTAT is not very selective (relatively few distinct values) and/or because those values are evenly distributed across the entire range of the ENTRIES table. If a query is going to hit pretty much every block in a table, FULL TABLE SCAN is the best path.
This guess is validated by the increased speed you get by filtering the sub-query.
As the others have suggested, refactoring your statement to remove the sub-query would be the best way to improve performance.
"There would be millions of entries having KeyStat= 0, only few in 1000s will have 1 so using index will be beneficial."
Skewed datadistribution is often a source of performance problems. You see, the thing is, the database doesn't know that KEYSTAT=1 is vastly more selective than KEYSTAT=0. Unless we tell it, which is why you might want to consider creating histograms when you gather statistics on that index. Find out more.
Note that histograms can cause problems as well as solve them, especially with queries which use bind variables instaead of literals. So benchmark in a sandpit before putting them into production.