Search code examples
sqlindexingsqliteexplain

SQL INDEX not used on WHERE ABS(x-y) < k condition, but used on y - k < x < y + k condition


I have a query involving couples of rows which have a less-than-2-hours time-difference (~0.08333 days):

SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2 
                    WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333

This query is rather slow, i.e. ~ 1 second (the table has ~ 10k rows).

An idea was to use an INDEX. Obviously CREATE INDEX id1 ON mytable(date) didn't improve anything, that's normal.


Then I noticed that the magical query CREATE INDEX id2 ON mytable(JULIANDAY(date))

  1. didn't help when using:

    ... WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
    
  2. didn't help when using:

    ... WHERE JULIANDAY(mt2.date) - 0.08333 < JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
    
  3. ... but massively improved the performance (query time happily divided by 50 !) when using:

    ... WHERE JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
          AND JULIANDAY(mt1.date) > JULIANDAY(mt2.date) - 0.08333
    

Of course 1., 2. and 3. are equivalent since mathematically,

|x-y| < 0.08333 <=> y - 0.08333 < x < y + 0.08333
                <=> x < y + 0.08333 AND x > y - 0.08333

Question: Why are solutions 1. and 2. not making use of INDEX whereas solution 3. is using it?


Note:

  • I'm using Python + Sqlite sqlite3 module

  • The fact solutions 1. and 2. are not using the index is confirmed when doing EXPLAIN QUERY PLAN SELECT ...:

    (0, 0, 0, u'SCAN TABLE mytable AS mt1')
    (0, 1, 1, u'SCAN TABLE mytable AS mt2')
    

    The fact solution 3. is using the index is shown when doing EXPLAIN QUERY PLAN SELECT ...:

    (0, 0, 1, u'SCAN TABLE mytable AS mt2')
    (0, 1, 0, u'SEARCH TABLE mytable AS mt1 USING INDEX id2 (<expr>>? AND <expr><?)')
    

Solution

  • I believe that the inclusion of AND is the reasoning as per :

    The WHERE clause on a query is broken up into "terms" where each term is separated from the others by an AND operator. If the WHERE clause is composed of constraints separate by the OR operator then the entire clause is considered to be a single "term" to which the OR-clause optimization is applied.

    The SQLite Query Optimizer Overview

    It may be worthwhile running ANALYZE to see if that improves matters.

    As per the comment:

    I think the previously added paragraph can clarify why ABS(x-y) < k is not using index, and why x < y + k is using it, don't you think so? Would you want to include this paragraph? [All terms of the WHERE clause are analyzed to see if they can be satisfied using indices. To be usable by an index a term must be of one of the following forms: column = expression, column IS expression, column > expression ...

    The following has been added.

    To be usable by an index a term must be of one of the following forms:
    column = expression
    column IS expression
    column > expression
    column >= expression
    column < expression
    column <= expression
    expression = column
    expression > column
    expression >= column
    expression < column
    expression <= column
    column IN (expression-list)
    column IN (subquery)
    column IS NULL

    I'm not sure if it would work with a BETWEEN (e.g. WHERE column BETWEEN expr1 AND expr2).