Search code examples
oracle-databaseperformanceindexinglinguistics

Oracle linguistic index not used when SQL contains parameter with LIKE


My schema (simplified):

CREATE TABLE LOC
(
   LOC_ID      NUMBER(15,0) NOT NULL,
   LOC_REF_NO  VARCHAR2(100 CHAR) NOT NULL
)
/

CREATE INDEX LOC_REF_NO_IDX ON LOC
(
   NLSSORT("LOC_REF_NO",'nls_sort=''BINARY_AI''') ASC
)
/

My query (in SQL*Plus):

ALTER SESSION SET NLS_COMP=LINGUISTIC NLS_SORT=BINARY_AI
/

VAR LOC_REF_NO VARCHAR2(50)
BEGIN
  :LOC_REF_NO := 'SPDJ1501270';
END;
/

-- Causes full table scan (i.e, does not use LOC_REF_NO_IDX)
SELECT * FROM LOC WHERE LOC_REF_NO LIKE :LOC_REF_NO||'%';

-- Causes index scan (i.e. uses LOC_REF_NO_IDX)
SELECT * FROM LOC WHERE LOC_REF_NO LIKE 'SPDJ1501270%';

That the index is not used has been confirmed by doing an AUTOTRACE (EXPLAIN PLAN) and the SQL just runs slower. Tried a number of thing without success. Anyone got any idea what is going on? I am using Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit.

Update 1:

Note that the index is used when I use an equals with a parameter:

SELECT * FROM LOC WHERE LOC_REF_NO = :LOC_REF_NO; 

Explain Plan:

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |     1 |    93 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| LOC            |     1 |    93 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | LOC_REF_NO_IDX |     1 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(NLSSORT("LOC_REF_NO",'nls_sort=''BINARY_AI''')=NLSSORT(:LOC_REF_NO,'nls_
              sort=''BINARY_AI'''))

Whereas

SELECT * FROM LOC WHERE LOC_REF_NO LIKE :LOC_REF_NO||'%';

Explain Plan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50068 |  3471K|  5724   (1)| 00:01:09 |
|*  1 |  TABLE ACCESS FULL| LOC  | 50068 |  3471K|  5724   (1)| 00:01:09 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("LOC_REF_NO" LIKE :LOC_REF_NO||'%')

Dumbfounded!

Update 2:

The reason we are using NLSSORT on an index is to make Oracle queries case insensitive and this was the general recommendation. Previously we use functional indexes with NLS_UPPER. The strange thing that is that the index is always used, parameter or not, as shown below.

So if table is as above, LOC_REF_NO_IDX index removed and this one added:

CREATE INDEX LOC_REF_NO_CI_IDX ON LOC
(
   NLS_UPPER(LOC_REF_NO) ASC
)
/

The all of the following use the index:

ALTER SESSION SET NLS_COMP=BINARY NLS_SORT=BINARY;

SELECT * FROM LOC WHERE NLS_UPPER(LOC_REF_NO) LIKE :LOC_REF_NO||'%';

    -------------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                   | 50068 |  5329K|  5700   (1)| 00:01:09 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| LOC               | 50068 |  5329K|  5700   (1)| 00:01:09 |
    |*  2 |   INDEX RANGE SCAN          | LOC_REF_NO_CI_IDX |  9012 |       |    43   (0)| 00:00:01 |
    -------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

       2 - access(NLS_UPPER("LOC_REF_NO") LIKE :LOC_REF_NO||'%')
           filter(NLS_UPPER("LOC_REF_NO") LIKE :LOC_REF_NO||'%')

So for some reason when using LIKE with a parameter on a linguistic index, the Oracle optimizer is deciding not to use the index.


Solution

  • According to Oracle support note 1451804.1 this is a known limitation of using LIKE with NLSSORT-based indexes.

    If you look at the execution plan for your fixed-value query you see something like:

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access(NLSSORT("LOC_REF_NO",'nls_sort=''BINARY_AI''')>=HEXTORAW('7370646A313530
                  3132373000')  AND NLSSORT("LOC_REF_NO",'nls_sort=''BINARY_AI''')<HEXTORAW('7370646A313
                  5303132373100') )
    

    Those raw values evaluate to spdj1501270 and spdj1501271; those are derived from your constant string, and any values matching your like condition will be in that range. That parse-time transformation has to be based on a constant value, and doesn't work with a bind variable or an expression, presumably because it's evaluated too late.

    See the note for more information, but there doesn't seem to be a workaround unfortunately. You might have to go back to your NLS_UPPER approach.


    Previous explanation applies generally but not in this specific case, but kept for reference...

    In general, with the fixed value the optimiser can estimate how selective your query is when it parses it, because it can know roughly what proportion of index values match that value. It may or may not use the index, depending on the actual value you use.

    With the bind variable it comes up with a plan via bind variable peeking:

    In bind variable peeking (also known as bind peeking), the optimizer looks at the value in a bind variable when the database performs a hard parse of a statement.

    When a query uses literals, the optimizer can use the literal values to find the best plan. However, when a query uses bind variables, the optimizer must select the best plan without the presence of literals in the SQL text. This task can be extremely difficult. By peeking at bind values the optimizer can determine the selectivity of a WHERE clause condition as if literals had been used, thereby improving the plan.

    It uses the statistics it has gathered to decide if any particular value is more likely than others. That probably isn't going to be the case here, especially with the like. It's falling back to a full table scan becuse it can't determine when it does the hard parse that the index will be more selective most of the time. Imagine, for example, that the parse decided to use the index, but then you supplied a bind value of just S, or even null - using the index would then do much more work than a full table scan.

    Also worth noting:

    When choosing a plan, the optimizer only peeks at the bind value during the hard parse. This plan may not be optimal for all possible values.

    Adaptive cursor sharing can mitigate this, but this query may not qualify:

    The criteria used by the optimizer to decide whether a cursor is bind-sensitive include the following:

    • The optimizer has peeked at the bind values to generate selectivity estimates.

    • A histogram exists on the column containing the bind value.

    When I mocked this up with a small-ish amount of limited data, v$sql reported both is_bind_sensitive and is_bind_aware as 'N'.