Search code examples
sqldatabasesqliteselectquery-optimization

SQLite query optimization: tasks selection overlapping given time window


I have an SQLite DB in my application which is used to store temporal tasks data (events with "begin" and "end" timestamps). I need to optimize the query of tasks overlapping the given time window (including tasks partially or fully overlapping the window). Note that simple query of all tasks fully fitting in window works pretty fast, but since I need also get partially and fully overlapping tasks it's not exactly what I need. In example below I create table named "tasks" and complex index "tasks_index" by "begin" and "end" columns.

CREATE TABLE "tasks" ( "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, "begin" BIGINT NOT NULL, "end" BIGINT NOT NULL )
CREATE INDEX "tasks_index" ON "tasks" ( "begin", "end" )

Tasks table:

----+-------+-------+
 id | begin | end   |
----+-------+-------+
 1  | 2     | 6     |
 2  | 9     | 14    |
 3  | 16    | 22    |
 4  | 24    | 28    |
 5  | 30    | 34    |
 6  | 36    | 42    |
 7  | 44    | 50    |
 8  | 9     | 27    |
 9  | 29    | 42    |
 10 | 4     | 47    |

Tasks on time-line:

   [-------------------10---------------------]
        [--------8--------] [-----9------]
 [-1-]  [--2-] [--3--] [-4-] [-5-] [--6--] [--7--]
----------(++++++++++++++++++++++++++)-------------->
          11                         21          time

Query with time window from 11 to 21:

SELECT id FROM tasks WHERE end > 11 AND begin < 21
Result: { 2, 3, 4, 5, 6, 8, 9, 10 }

NOTE: tasks 3-5 fully fitting in window, tasks 2, 6, 8, 9 partially overlapping with window and task 10 is fully overlapping the window

PROBLEM: I've noticed that when I have large amount of tasks (~7 millions) and query them using small window located close to the right end of timeline the query becomes extremely slow and it works slower as the window comes closer to the right end. Query plan shows that search of "tasks" is using "tasks_index" only partially using just "begin" column, while the rest of the query seems to be performed without using the index and becomes slow when first part of selection returns large set of data.

EXPLAIN QUERY PLAN
SELECT id FROM tasks WHERE end > 11 AND begin < 21
SEARCH TABLE tasks USING INDEX tasks_index (begin<?) - end is not used from index!

Could you propose a different index or DB structure organization to optimize such kind of queries? If this is some kind of SQLite limitation, could you propose other in-process database engine?


Solution

  • I am not an expert of Sqlite, but what you need is probably the SQLite R*Tree Module

    Please refer to this question as well: SQLite and range query