I'm using ClickHouse for a "kind of updatable large (hundreds millions rows) table" with ReplacingMergeTree
. I need to upsert by batch and do some non-aggregated select. It works fine.
Even though it's a bit of a hack, and far from optimal (I mean unlike Clickhouse for OLAP), it can scale quite well and still performs faster than systems more or less dedicated to this like HBase or RDBMs (for my needs).
I use a ReplacingMergeTree
table with a key:
CREATE TABLE Things (Key Int32, ValueA Int32, ValueB Int32)
ENGINE = ReplacingMergeTree() ORDER BY Key
I upsert with:
INSERT INTO Things (Key,ValueA,ValueB) ...
And select with the "FINAL
" modifier:
SELECT Key,ValueA,ValueB FROM Things FINAL WHERE ...
I can "delete" objects by using a column named "Killed". But from time to time, I need to clean "Killed" objects to prevent the table from growing endlessly.
The only method I found was recreating a new table and inserting in it non killed rows. Is there a smarter way?
ClickHouse supports DML
operations in recent releases, so you don't need ReplacingMergeTree
to tombstone records like that.
Checkout https://clickhouse.yandex/docs/en/query_language/alter/#mutations for more details.