Search code examples
sqlclickhouse

How to automatically update latest data upon insertion in Clickhouse?


I am working on some time-series data, say the memory usage of some application, with Clickhouse. To analyze the historic trending of the data, I store the historic data in a table, with some schema like:

CREATE TABLE memory_usage (
  timestamp DateTime
  app_id uint64
  usage uint64
) ENGINE = MergeTree()

Besides the historic data, I am also interested in the real time data. Since the real time data is accessed frequently, I suppose it's more time-efficient to store it in another table, instead of running complicated query per access. So I'd like some table as:

CREATE TABLE real_time_memory_usage (
  app_id uint64
  usage uint64
) ENGINE = MergeTree()

And there is only one record for each (app_id, usage) pair, which is the corresponding record in memory_usage with latest timestamp.

I know this can be done in my application, where I insert a record to memory_usage and then update real_time_memory_usage. But since I learned that there are some useful features like materialized view and insert trigger in Clickhouse, I wonder whether I can leave the heavy work to the database, i.e., just inserting a record to memory_usage and letting the database update real_time_memory_usage automatically. And if so, what's the optimal way to do this?


Solution

  • Use

    CREATE TABLE real_time_memory_usage (
      app_id uint64
      usage uint64
    ) ENGINE = ReplacingMergeTree()
    ORDER BY app_id;
    

    and always use FINAL modificator during select SELECT * FROM real_time_memory_usage FINAL WHERE app_id=XXX

    look details in documentations https://clickhouse.com/docs/en/guides/replacing-merge-tree