Search code examples
databaseperformanceindexingquery-optimizationclickhouse

ClickHouse Index Not getting Utilised for Existing Data(scanning all the rows post creating the index on the existing table)


I am facing an issue with ClickHouse indexing where the index is not being utilized when querying existing data. I have created an index on a specific column, but queries do not seem to use it effectively for existing records. The index works fine when created alongside the table but not for existing data. Here are the details of my setup:

  1. I have a table named devices (that is partitioned on date field and uses ReplacingMergeTree Engine)

  2. I have created an index using the following query:

ALTER TABLE devices
    ADD INDEX device_type (properties_device_type) TYPE set(20) GRANULARITY 2;
  1. I have a query that I expect to use this index:
SELECT *
FROM devices
WHERE properties_device_type = 'Android';

Despite creating the index, the query still scans all rows, which is not efficient.

If I am creating the Index post inserting the data then it is working as expected and is scanning only limited rows but as per my use case I have time series data and I want to use the index on top of that to optimise the data scanning process.

What could be causing the index not to be utilised for existing data? Are there any specific considerations for using indexes with existing data in ClickHouse? How can I optimize my setup to ensure the index is used effectively for queries on existing records?

Any insights, recommendations, or guidance would be greatly appreciated. Thank you!

I've also tried using the USE INDEX in the query, but it did not resolve the issue(Seems to be unsupported)

SELECT *
FROM devices
WHERE properties_device_type = 'Android'
USE INDEX (properties_device_type);

I even tried EXPLAIN query to check if myIndex is getting used, but

EXPLAIN SELECT *
FROM devices
WHERE properties_device_type = 'Android'

Response:

explain query response

I have considered factors like data distribution, query patterns, and data types, but the index is not being used as expected.


Solution

  • When you define a new skipping index (or projection) on an existing table, ClickHouse does not go back and build the index on already existing parts. You need to wait for a merge, or better yet materialize the index:

    ALTER TABLE devices
        MATERIALIZE INDEX device_type;