I am working on some time-series data, say the memory usage of some application, with Clickhouse. To analyze the historic trending of the data, I store the historic data in a table, with some schema like:
CREATE TABLE memory_usage (
timestamp DateTime
app_id uint64
usage uint64
) ENGINE = MergeTree()
Besides the historic data, I am also interested in the real time data. Since the real time data is accessed frequently, I suppose it's more time-efficient to store it in another table, instead of running complicated query per access. So I'd like some table as:
CREATE TABLE real_time_memory_usage (
app_id uint64
usage uint64
) ENGINE = MergeTree()
And there is only one record for each (app_id, usage) pair, which is the corresponding record in memory_usage
with latest timestamp.
I know this can be done in my application, where I insert a record to memory_usage
and then update real_time_memory_usage
. But since I learned that there are some useful features like materialized view and insert trigger in Clickhouse, I wonder whether I can leave the heavy work to the database, i.e., just inserting a record to memory_usage
and letting the database update real_time_memory_usage
automatically. And if so, what's the optimal way to do this?
Use
CREATE TABLE real_time_memory_usage (
app_id uint64
usage uint64
) ENGINE = ReplacingMergeTree()
ORDER BY app_id;
and always use FINAL
modificator during select
SELECT * FROM real_time_memory_usage FINAL WHERE app_id=XXX
look details in documentations https://clickhouse.com/docs/en/guides/replacing-merge-tree