Search code examples
sqlsqlitequery-plannercovering-index

Does a covering index have an effect on JOIN + BETWEEN operations


I am wondering about how a covering index can help in the case where you first JOIN on a column, and then have BETWEEN constraints on a second and third column. Would a combined index on these three have a positive effect?

Example (fiddle).

CREATE TABLE tbl1(firstcol TEXT);
CREATE TABLE tbl2(firstcol TEXT, secondcol INTEGER, thirdcol INTEGER);

INSERT INTO tbl1 VALUES ('hello');
INSERT INTO tbl1 VALUES ('good morning');
INSERT INTO tbl2 VALUES ('hello', '1', '1');
INSERT INTO tbl2 VALUES ('hello', '5', '210');
INSERT INTO tbl2 VALUES ('bonjour', '999', '1');
INSERT INTO tbl2 VALUES ('hello', '4', '20');

CREATE INDEX myfirstindex ON tbl1 (firstcol);
CREATE INDEX mymultiindex ON tbl2 (firstcol, secondcol, thirdcol);

SELECT *
FROM tbl1
JOIN tbl2 USING(firstcol)
WHERE
    tbl2.secondcol BETWEEN 0 AND 100
    AND tbl2.thirdcol BETWEEN 0 AND 50

If I call for EXPLAIN QUERY PLAN, then I see that it has an effect but only on the first and second col. So why isn't the third included here? Why does the optimizer choose to not make use of the joint index on all three columns?

id  parent  notused detail
3   0   0   SCAN tbl1
5   0   0   SEARCH tbl2 USING COVERING INDEX mymultiindex (firstcol=? AND secondcol>? AND secondcol<?)

Solution

  • From The SQLite Query Optimizer Overview/The BETWEEN Optimization:

    If a term of the WHERE clause is of the following form:
    expr1 BETWEEN expr2 AND expr3
    Then two "virtual" terms are added as follows:
    expr1>= expr2 AND expr1 <= expr3

    This means that your WHERE clause will be interpreted as:

    WHERE tbl2.secondcol >= 0 AND tbl2.secondcol <= 100
      AND tbl2.thirdcol >= 0 AND tbl2.thirdcol <= 50
    

    From The SQLite Query Optimizer Overview/WHERE Clause Analysis:

    If an index is created using a statement like this:
    CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
    Then the index might be used if the initial columns of the index (columns a, b, and so forth) appear in WHERE clause terms.
    The initial columns of the index must be used with the = or IN or IS operators.
    The right-most column that is used can employ inequalities.
    For the right-most column of an index that is used,
    there can be up to two inequalities that must sandwich the allowed values of the column between two extremes....

    From the above I understand that only the first 2 inequalities, the first BETWEEN, will be used by the index.