I want to create a clickhouse table that stores the max value of stats over a period of every one hour. My base table looks like below
CREATE TABLE base_table
(
`time` DateTime,
`stats` Float64
)
ENGINE = MergeTree
ORDER BY time
SETTINGS index_granularity = 8192
The materialized view look like below
CREATE MATERIALIZED VIEW hourly_max_stats_mv TO hourly_max_stats
(
`hour_start` DateTime,
`hour_end` DateTime,
`max_stats` Float64
) AS
SELECT
toStartOfHour(time) AS hour_start,
addHours(toStartOfHour(time), 1) AS hour_end,
max(stats) AS max_stats
FROM base_table
GROUP BY
hour_start,
hour_end
When I insert the below 2 rows in the base table, I am expecting the hourly_max_stats_mv table to contain only 1 row with the max value over an interval of 1 hour.
INSERT INTO base_table (time, stats) VALUES ('2024-01-18 12:30:00', 10);
INSERT INTO base_table (time, stats) VALUES ('2024-01-18 12:40:00', 20);
Instead I see both the records in hourly_max_stats. I checked the clickhouse logs and there are no errors. I even waited for more than an hour to let the background process aggregate the result, but I still see 2 records. Any idea what am I doing wrong?
You are using MergeTree
on your destination table when you actually want to be using AggregatingMergeTree
. Here is what you need (I'm a little confused as to why you need the "end" of an hour so I removed it):
CREATE TABLE hourly_max_stats (
hour_start DateTime,
max_stats SimpleAggregateFunction(max, Float64)
)
ENGINE = AggregatingMergeTree
PRIMARY KEY hour_start;
CREATE MATERIALIZED VIEW hourly_max_stats_mv
TO hourly_max_stats
AS
SELECT
toStartOfHour(time) AS hour_start,
maxSimpleState(stats) AS max_stats
FROM base_table
GROUP BY
hour_start;
INSERT INTO base_table (time, stats) VALUES ('2024-01-18 12:30:00', 10);
INSERT INTO base_table (time, stats) VALUES ('2024-01-18 12:40:00', 20);
SELECT
hour_start,
max(max_stats)
FROM hourly_max_stats
GROUP BY hour_start;
Query id: 194b3229-f94f-4503-abc7-552a11f4fc7a
┌──────────hour_start─┬─max(max_stats)─┐
│ 2024-01-18 12:00:00 │ 20 │
└─────────────────────┴────────────────┘
1 row in set. Elapsed: 0.025 sec.
I explain the details here: https://www.youtube.com/watch?v=-Ma2ohy_6lM&list=PL0Z2YDlm0b3gtIdcZI3B_8bMJclDOvY8s&index=10