I have a query involving couples of rows which have a less-than-2-hours time-difference (~0.08333 days):
SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2
WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
This query is rather slow, i.e. ~ 1 second (the table has ~ 10k rows).
An idea was to use an INDEX
. Obviously CREATE INDEX id1 ON mytable(date)
didn't improve anything, that's normal.
Then I noticed that the magical query CREATE INDEX id2 ON mytable(JULIANDAY(date))
didn't help when using:
... WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
didn't help when using:
... WHERE JULIANDAY(mt2.date) - 0.08333 < JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
... but massively improved the performance (query time happily divided by 50 !) when using:
... WHERE JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
AND JULIANDAY(mt1.date) > JULIANDAY(mt2.date) - 0.08333
Of course 1., 2. and 3. are equivalent since mathematically,
|x-y| < 0.08333 <=> y - 0.08333 < x < y + 0.08333
<=> x < y + 0.08333 AND x > y - 0.08333
Question: Why are solutions 1. and 2. not making use of INDEX whereas solution 3. is using it?
Note:
I'm using Python + Sqlite sqlite3
module
The fact solutions 1. and 2. are not using the index is confirmed when doing EXPLAIN QUERY PLAN SELECT ...
:
(0, 0, 0, u'SCAN TABLE mytable AS mt1')
(0, 1, 1, u'SCAN TABLE mytable AS mt2')
The fact solution 3. is using the index is shown when doing EXPLAIN QUERY PLAN SELECT ...
:
(0, 0, 1, u'SCAN TABLE mytable AS mt2')
(0, 1, 0, u'SEARCH TABLE mytable AS mt1 USING INDEX id2 (<expr>>? AND <expr><?)')
I believe that the inclusion of AND
is the reasoning as per :
The WHERE clause on a query is broken up into "terms" where each term is separated from the others by an AND operator. If the WHERE clause is composed of constraints separate by the OR operator then the entire clause is considered to be a single "term" to which the OR-clause optimization is applied.
The SQLite Query Optimizer Overview
It may be worthwhile running ANALYZE
to see if that improves matters.
As per the comment:
I think the previously added paragraph can clarify why ABS(x-y) < k is not using index, and why x < y + k is using it, don't you think so? Would you want to include this paragraph? [All terms of the WHERE clause are analyzed to see if they can be satisfied using indices. To be usable by an index a term must be of one of the following forms: column = expression, column IS expression, column > expression ...
The following has been added.
To be usable by an index a term must be of one of the following forms:
column = expression
column IS expression
column > expression
column >= expression
column < expression
column <= expression
expression = column
expression > column
expression >= column
expression < column
expression <= column
column IN (expression-list)
column IN (subquery)
column IS NULL
I'm not sure if it would work with a BETWEEN
(e.g. WHERE column BETWEEN expr1 AND expr2
).