Search code examples
database-schemaclickhousematerialized-views

Is there a Clickhouse feature that simplifies this aggregation schema (every record for a week; 5 minute totals for a year)


I have a reporting Clickhouse database that stores a very large amount of DNS traffic logs (big enough that it's only practical to store 2 days of raw query logs). A main table stores records of individual DNS queries, then materialised views aggregate that down into useful chunks for graphing/reporting (eg ip_protocol by query_types by region).

Storing even those aggregates is very large (we can keep about 1 week of data). So for long term (yearly) reporting I also aggregate that down into totals for each 5min block using materialised views with SummingMergeTree & toStartOfFiveMinute().

ie [original data] -> [mat-view SummingMergeTree(counts), TTL 7 DAYS]

[original data] -> [mat-view SummingMergeTree(5minTimestamp, counts), TTL 1 YEAR].

That works, but it means our graphs have to hit two different views (there are a lot of views and a lot of graphs). And it just feels a bit clunky.

Clickhouse is so good at reporting I wonder if there's some aggregation method or storage engine that would let me store this in a single place. Something that could merge over ' toSecond(..) if timestamp < 7 days old, then toStartOfFiveMinute(..) "

Is there some built in feature, or structure I'm unaware of?

Thank you.


Solution

  • You may find useful TTL Group by https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl-group-by-examples It allows to reduce data in a single table.

    Other option is GrapiteMergeTree https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/graphitemergetree/