ClickHouse version I using: 22.11-alpine
I use Null
table engine in my project as temp table from which data will be sent to ReplicatedAggregatingMergeTree
.
This described test case indicates problem:
CREATE TABLE default.data ON CLUSTER '{cluster}'
(
id Int8,
group_by String,
value Int8
)
ENGINE = Null;
CREATE TABLE default.data_agg ON CLUSTER '{cluster}'
(
group_by String,
value AggregateFunction(sum, Int8)
)
ENGINE = ReplicatedAggregatingMergeTree
ORDER BY group_by;
CREATE MATERIALIZED VIEW default.data_agg_mv ON CLUSTER '{cluster}'
TO default.data_agg AS
SELECT
group_by,
sumState(value) as value
FROM default.data
GROUP BY group_by;
CREATE TABLE default.data_agg_dis ON CLUSTER '{cluster}'
AS default.data_agg
ENGINE = Distributed('{cluster}', default, data_agg, rand());
All that schemas are created on the cluster (ClickHouse Keeper), with 4 nodes, 4 shards, and 2 replicas.
Problem:
INSERT INTO default.data VALUES (1, 'group1', 1);
Above insert query at first-time increments value in data_agg_dis
correctly:
SELECT group_by, sumMerge(value) as value FROM default.data_agg_dis GROUP BY group_by;
Results:
group_by | value
group1 | 1
But when I try to repeat mentioned insert value for group_by: group1
, value
reaches just up to value: 4
. Feels like that when all shards touch all other aggregates are skipped.
But if I change group_by: group1
value in insert query from 1 to 2:
INSERT INTO default.data VALUES (1, 'group1', 2);
again in data_agg_dis
value increments by 2, four times and stops. This behavior is reproducible just on cluster on a single node ClickHouse server all works as espected.
Does anybody aware of same issue? it is some limitation of ClickHouse cluster? or bug?
Your ReplicatedAggregatingMergeTree
just apply deduplication by block cause you try to INSERT the same data.
Look to https://clickhouse.com/docs/en/operations/settings/settings/#settings-insert-deduplicate for details.