Search code examples
indexingclickhouse

Clickhouse doesn't use skip index with Final modifer in query


I have Added a skip index on a field (status_id) in clickhouse. It is used and working when I use my query as below:

EXPLAIN INDEXES=1
SELECT COUNT(*) FROM mytable WHERE 
created_date='2022-10-05' and status_id=123;

The result:

Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Filter (WHERE)
        SettingQuotaAndLimits (Set limits and quota after reading from storage)
          ReadFromMergeTree
          Indexes:
            MinMax
              Keys: 
                created_date
              Condition: (created_date in [19270, 19270])
              Parts: 3/138
              Granules: 346/8840
            Partition
              Keys: 
                toYYYYMM(created_date)
              Condition: (toYYYYMM(created_date) in [202210, 202210])
              Parts: 3/3
              Granules: 346/346
            PrimaryKey
              Condition: true
              Parts: 3/3
              Granules: 346/346
            Skip
              Name: status_index
              Description: set GRANULARITY 1
              Parts: 3/3
              Granules: 14/346

But when I use the final modifier, the skip index is not used anymore:

EXPLAIN INDEXES=1
SELECT COUNT(*) FROM mytable FINAL WHERE 
created_date='2022-10-05' and status_id=123;

The result:

Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Filter (WHERE)
        SettingQuotaAndLimits (Set limits and quota after reading from storage)
          ReadFromMergeTree
          Indexes:
            MinMax
              Keys: 
                created_date
              Condition: (created_date in [19270, 19270])
              Parts: 3/136
              Granules: 346/8838
            Partition
              Keys: 
                toYYYYMM(created_date)
              Condition: (toYYYYMM(created_date) in [202210, 202210])
              Parts: 3/3
              Granules: 346/346
            PrimaryKey
              Condition: true
              Parts: 3/3
              Granules: 346/346

I couldn't find anything in the documentation that Final keyword would prevent clickhouse from utilizing skip indexes. So I'm wondering what is wrong.

Clickhouse Version: 22.3.15.33

Table Engine: Replacing Merge Tree


Solution

  • it's deliberately otherwise you will get an incorrect result https://github.com/ClickHouse/ClickHouse/pull/34243

    there is a setting: use_skip_indexes_if_final

    It is impossible to get the correct result in general if one use SKIP INDEX and FINAL. SKIP INDEX works before FINAL and filters out FINAL rows.

    imagine you have a table

    key   | version      | some_column_which_should_calculated_during_the_final 
    ---------------------------------------------------------------------------
    1     |     1        | true
    1     |     2        | false
    

    select * from .... FINAL where some_column_which_should_calculated_during_the_final=true should return zero rows. Because the final row version=2 is = false

    But if you use skip index it will return 1 | 1 | true because SKIP index filters out version=2, then final will evaluate 1 row instead of 2

    Pipeline

    1. Partition Pruning
    2. Primary index
    3. Skip index / PREWHERE
    4. FINAL
    5. WHERE

    so the same issue with Partition Pruning (if you use some_column_which_should_calculated_during_the_final in PARITION BY)