The documentation regarding the buffer table engine for a replicated destination table contains the following warning:
https://clickhouse.com/docs/en/engines/table-engines/special/buffer/
"If the destination table is replicated, some expected characteristics of replicated tables are lost when writing to a Buffer table. The random changes to the order of rows and sizes of data parts cause data deduplication to quit working, which means it is not possible to have a reliable ‘exactly once’ write to replicated tables."
From my understanding of how replicated tables apply block-level deduplication, (*) this would imply that writes will occur at least once.
Is this correct? Or is there a possibility that writes might be lost under rare circumstances?
(*)
https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
https://kb.altinity.com/altinity-kb-schema-design/insert_deduplication/
You lose two things with buffer tables -- one is "automatic" deduplication when the client inserts exactly the same block more than once. This allows for the possibility of duplicate data if the client believes a write has failed when it was in fact successful and attempts to rewrite the same batch. That can normally be managed at the client level.
The other is the possibility of data loss if something happens to the ClickHouse server while data is only in the "in memory" buffer table before one of the flush conditions is met. insert_quorum
only applies to ReplicateMergeTree tables, not Buffer tables, so until the flush there is only one copy of the data in memory that has already been acknowledged to client as "written" but has not yet been stored on disk or replicated. Using buffer tables means accepting the possibility of this data loss if the server crashes for some reason.