Relationship between primary_key_bytes_in_memory and mark cache size

I am trying to understand the metrics around the mark cache on an AggregatingMergeTree on 21.8-altinitystable.

What is the difference between these columns on the system.parts table? primary_key_bytes_in_memory and primary_key_bytes_in_memory_allocated? Do they represent the portion of mark_bytes that are in memory in the mark cache?

Are they related in any way with the MarkCacheBytes metric in the system.asynchronous_metrics table? I have a 4Gb mark cache size, MarkCacheBytes shows it being completely used but the sum of both primary_key_bytes_in_memory and primary_key_bytes_in_memory_allocated across tables and parts is much lower (like respectively 1 and 2 Gb).

Thanks Filippo

Solution

Sorry, for previous answer.

I try to explain more details:

What is the difference between these columns on the system.parts table? primary_key_bytes_in_memory and primary_key_bytes_in_memory_allocated?

According to the source https://github.com/ClickHouse/ClickHouse/blob/229d35408b61a814dc1cb5a4cefcfa852efa13fe/src/Storages/System/StorageSystemParts.cpp#L181-L184

primary_key_bytes_in_memory - it's size of primary.idx loaded in memory primary_key_bytes_in_memory_allocated - during loading in memory primary.idx splitted by columns and during split allocated memory is little bit bigger than raw size

Do they represent the portion of mark_bytes that are in memory in the mark cache?

no, it represented only primary.idx representation in memory for selected part

Are they related in any way with the MarkCacheBytes metric in the system.asynchronous_metrics table

No, field above are not related to MarkCache, MarkCache related metrics show only loaded <column_name>.mrk2 files into memory. And CacheHit, CacheMiss for this mark cache

Every record in primary.idx contains values for primary key fields and number of granula for each one row from 8192 rows in raw data it's a granual

every record in <column_name>.mrk2 contains offset in compressed file <column_name>.bin for begin, offset in decompressed block and number of rows for <column_name> contains in granula

I hope it help you to figure out