I have a table called Master_table
in Clickhouse, I am bale to calculate the size occupied by this table since it is present in system.parts
.
However, I need to calculate the size occupied by only certain rows in this table, which I would filter by some unique IDs.
I am able to check the size occupied by the Mater_table
table using the below query:
SELECT
database,
table,
formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed
FROM system.parts
WHERE (active = 1) AND (table = 'Master_table')
GROUP BY
database,
table
ORDER BY size DESC;
Data in ClickHouse is stored in column files, so looking at the space of a subset of rows is not going to be an exact science. However, you could estimate it by assuming that storage of each column is the same. With that assumption, you could just figure out the rows you're interested in is as a percentage of the overall dataset.
I tried it out with a table named flights
and looked at flights from American Airlines (the airline
column was equal to "AA"):
WITH
(
SELECT countIf(airline='AA')/count() FROM flights
) AS fraction
SELECT
database,
`table`,
formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
fraction,
formatReadableSize(size * fraction) AS subset_usage
FROM system.parts
WHERE (active = 1) AND (`table` = 'flights')
GROUP BY
database,
`table`
ORDER BY size DESC
Here's the response:
┌─database─┬─table───┬─compressed─┬─uncompressed─┬────────────fraction─┬─subset_usage─┐
│ default │ flights │ 105.09 MiB │ 173.29 MiB │ 0.13900430815375672 │ 14.61 MiB │
└──────────┴─────────┴────────────┴──────────────┴─────────────────────┴──────────────┘