Search code examples
scylla

Why the key prefix optimization doesn't work with secondary index on a clustering column?


Scylla DB implements, so-called "key prefix optimization" for secondary indexes, which eliminates filtering if a part of the primary key is specified. E.g. it's possible to execute SELECT * FROM A WHERE a = 'a' AND b = 'a' AND d = 'a'; on table A.

CREATE TABLE A (
    a text,
    b text,
    c text,
    d text,
    PRIMARY KEY(a,b,c)
);
CREATE INDEX A_index ON A (d);

But it doesn't work if A.d is a clustering column. E.g. as in table B below.

CREATE TABLE B (
    a text,
    b text,
    c text,
    d text,
    PRIMARY KEY(a,b,c,d)
);
CREATE INDEX B_index ON B (d);

The above SELECT query fails with the error:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

ScyllaDB 3.0.1.


Solution

  • thanks for finding an interesting corner case :)

    The problem is that the second query restricts clustering columns (b, d), which in itself does not form a clustering key prefix. Of course, d is indexed, so what should happen is using a in key prefix optimization and d as an indexed column.

    Instead, it's wrongfully decided that (b, d) does not form a prefix, so it's discarded from optimization candidates, without taking into account that d has an index.

    This simplification will be fixed, I created a bug tracker issue here: https://github.com/scylladb/scylla/issues/4178