I have a table with columns a,b,c,d,e, and my SQL looks like this:
select a, b, c, d, e
from table
where
upper(a) in (a1)
and b in (b1, b2, b3,..., b1000)
and c in (c1, c2, c3,..., c10000)
with all 3 columns registered index, but the SQL execute is slow with the filter b in (b1, b2, b3,...)
. I tried to remove filter for b and filter for c separately where both test cases give similar data size, but the SQL without b can run very fast, like < 500 ms, with filter for b it takes > 2.5s.
The columns b and c are all string type, only difference is that values of c is all in length 5, but column b has various string length (most in 5 length, but 2% have over 20 chars in the string value).
I tried to tune the inline size but does not make any difference. Is there any way to improve the speed? Seems like index is not as efficient as the one for other columns
I define the table schema as following:
CacheConfiguration<AffinityKey<String>, Object> dataCacheCfg = new CacheConfiguration<>();
dataCacheCfg.setName(tableName);
QueryEntity queryEntity = new QueryEntity(AffinityKey.class, Object.class)
.setTableName(tableName);
queryEntity.addQueryField(a, "String.class", null);
queryEntity.addQueryField(b, "String.class", null);
queryEntity.addQueryField(c, "String.class", null);
queryEntity.addQueryField(d, "String.class", null);
queryEntity.addQueryField(e, "String.class", null);
...
List<QueryIndex> queryIndices = new ArrayList<>();
queryIndices(new QueryIndex(a))
queryIndices(new QueryIndex(b))
queryIndices(new QueryIndex(c))
queryEntity.setIndexes(queryIndices);
dataCacheCfg.setQueryEntities(List.of(queryEntity));
ignite.getOrCreateCache(dataCacheCfg);
And the EXPLAIN of the sql is
SELECT
"__Z0"."b" AS "__C0_0",
"__Z0"."d" AS "__C0_1",
"__Z0"."e" AS "__C0_2",
"__Z0"."f" AS "__C0_3",
"__Z0"."c" AS "__C0_4",
"__Z0"."g" AS "__C0_5",
"__Z0"."a" AS "__C0_6"
FROM "my_table"."MY_TABLE" "__Z0"
/* my_table.MY_TABLE_B_ASC_IDX: b IN('b1', 'b2', ..., 'b1000') */
WHERE ("__Z0"."c" IN('c1', 'c2', ..., 'c10000'))
AND (("__Z0"."b" IN('b1', 'b2', ..., 'b1000'))
AND (UPPER("__Z0"."a") = 'a1'))
There are some tricks that can help you with your query:
1)You are using three fields in the WHERE statement. It will be good to create the index with all three fields inside.
2)Different index fields can have different selectivity. Just imagine that index for boolean field will separate the entries on two possible branches (true and false). This is bad index. So you need to have most "selective" fields in your index.
First of all check how many unique values you have for your fields:
select count(distinct(a)) from TABLE
select count(distinct(b)) from TABLE
select count(distinct(c)) from TABLE
Let's imagine that you have 10 unique values for b, 1000 for a and 100000 for c. In this case index for the field c will work better.
So you can create the following index:
CREATE INDEX table_c_a_b_idx ON TABLE (c,a,b)
Also it can explain why your index for B works not ok.
2)In case all three fields have the +- same selectivity, the first field in the index must be a field that has no ranges, since it only has one value. In your case it is field a. The order of b and c is not important in this case.
CREATE INDEX table_a_b_c_idx ON TABLE (a,b,c)
3)Also indexes can be inefficient if they cannot be covered by inline size. Please read:
In case your B string might be quite long, you might need to increase the inline size:
Index inline for only b = binary object’s hash(5) + your average/max string length (5-20)
Index inline for only a,b,c = binary object’s hash(5) + a average/max string length (5) + b average/max string length (from 5 to 20) + c average/max string length (5)
As far as I know Ignite uses only 10 bytes for strings, so if your strings are longer than 10, you may increase the inline size.
CREATE INDEX table_b_idx ON TABLE (b) INLINE_SIZE 25;
CREATE INDEX table_a_b_c_idx ON TABLE (a,b,c) INLINE_SIZE 35;
Maybe this information can help you with your SQL optimization