Search code examples
sqldatabaseclickhouse

Calculate the size occupied by a filtered table not directly present in system.parts in ClickHouse


I have a table called Master_table in Clickhouse, I am bale to calculate the size occupied by this table since it is present in system.parts.

However, I need to calculate the size occupied by only certain rows in this table, which I would filter by some unique IDs.

I am able to check the size occupied by the Mater_table table using the below query:

SELECT
    database,
    table,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed   
FROM system.parts
WHERE (active = 1) AND (table = 'Master_table')
GROUP BY
    database,
    table
ORDER BY size DESC;

Solution

  • Data in ClickHouse is stored in column files, so looking at the space of a subset of rows is not going to be an exact science. However, you could estimate it by assuming that storage of each column is the same. With that assumption, you could just figure out the rows you're interested in is as a percentage of the overall dataset.

    I tried it out with a table named flights and looked at flights from American Airlines (the airline column was equal to "AA"):

    WITH
        (
            SELECT countIf(airline='AA')/count() FROM flights
        ) AS fraction
    SELECT
        database,
        `table`,
        formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
        formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
        fraction,
        formatReadableSize(size * fraction) AS subset_usage
    FROM system.parts
    WHERE (active = 1) AND (`table` = 'flights')
    GROUP BY
        database,
        `table`
    ORDER BY size DESC
    

    Here's the response:

    ┌─database─┬─table───┬─compressed─┬─uncompressed─┬────────────fraction─┬─subset_usage─┐
    │ default  │ flights │ 105.09 MiB │ 173.29 MiB   │ 0.13900430815375672 │ 14.61 MiB    │
    └──────────┴─────────┴────────────┴──────────────┴─────────────────────┴──────────────┘