Friends, the project has a table like this:
CREATE TABLE events_1h
(
`round_time` DateTime,
`dt` UInt8,
`aa_id` UInt64,
`bb_id` UInt64,
`cc_id` UInt64,
`cpu_architecture` String,
`browser_name` String,
`browser_version` String,
`browser_major` String,
`os_name` String,
`os_version` String,
`device_type` String,
`device_vendor` String,
`device_model` String,
`country` FixedString(2),
`city` UInt32,
`aso` UInt32,
`asn` UInt32,
`referer` String,
`request` UInt32,
`answer` UInt32,
`Impression` UInt32,
`Error` UInt32,
`start` UInt32,
// Many other events here
)
ENGINE = SummingMergeTree
PRIMARY KEY (round_time, dt, aa_id, bb_id, cc_id)
ORDER BY (round_time, dt, aa_id, bb_id, cc_id, cpu_architecture, browser_name, browser_version, browser_major, os_name, os_version, device_type, device_vendor, device_model, country, city, aso, asn, referer);
The number of ORDER BY keys for metrics is large. This is necessary for intersecting data in any combinations based on metrics. If we create separate combinations for the required metrics, then event fields are duplicated, and there are also several dozen of them.
The question is whether it makes sense to replace a large ORDER BY with a custom hash field and sum the data based on it, making the current ORDER BY fields simply indexes for faster queries, or if ORDER BY itself is transformed into a hash under the hood and this approach does not make sense. In general, it is interesting to know what practices exist for optimizing such cases. Thank you!
In the future, I expect that the number of metrics will grow as well as the amount of data and this may lead to performance problems.
Custom hash field have no sense, applying SummingMergeTree
have sense when rows after GROUP BY <all_fields_from_ORDER_BY>
will minimum 3-5 time less then rows in original
Better try to use engine=MergeTree and multiple projections for different most used combinations of field with GROUP BY
Look documentation https://clickhouse.com/docs/en/sql-reference/statements/alter/projection#example-pre-aggregation-query
and better move round_time
PRIMARY KEY (dt, aa_id, bb_id, cc_id, round_time)
look documentation https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/