Search code examples
clickhouse

How to use object(json) with AggregateFunction?


Basically I have a json field in a table and want to pass the most recent value to AggregatedMergeTree field via materialized view.

Unfortunately, I get the following type error:

 DB::Exception: Conversion from AggregateFunction(argMax, Tuple(_dummy UInt8), DateTime) to AggregateFunction(argMax, Object('json'), DateTime) is not supported: ... (CANNOT_CONVERT_TYPE) (version 23.4.1.1943 (official build))

I guess the reason is that Object('json') is tuple under the hood, but with fluctuating content, and function's signature is somewhat constant.

Nevertheless I'm wondering what's the right way to get around it.

CREATE TABLE raw (
  user_id           String,
  created_at        DateTime,
  value             JSON
)
ENGINE = ReplacingMergeTree()
PRIMARY KEY (created_at, user_id)
CREATE TABLE state (
  user_id           String,
  _value            AggregateFunction(argMax, JSON, DateTime)
)
ENGINE = AggregatingMergeTree()
PRIMARY KEY (user_id)
CREATE MATERIALIZED VIEW state_mv TO state AS
  SELECT
    user_id,
    argMaxState(raw.value, raw.created_at) AS _value
  FROM raw
  GROUP BY user_id

Solution

  • So I came up with the following solution:

    CREATE TABLE raw (
      user_id           String,
      created_at        DateTime,
      expiring_at       DateTime,
      value_raw         String
    )
    ENGINE = ReplacingMergeTree()
    PRIMARY KEY (created_at, user_id, expiring_at)
    
    CREATE TABLE state (
      user_id           String,
      value             JSON DEFAULT cast(finalizeAggregation(_value_raw), 'String'),
      _expiring_at      AggregateFunction(argMax, DateTime, DateTime),
      _value_raw        AggregateFunction(argMax, String, DateTime)
    )
    ENGINE = AggregatingMergeTree()
    PRIMARY KEY (user_id)
    TTL finalizeAggregation(_expiring_at)
    
    CREATE MATERIALIZED VIEW state_mv TO state AS
      SELECT
        user_id,
        argMaxState(raw.expiring_at, raw.created_at) AS _expiring_at,
        argMaxState(raw.value_raw, raw.created_at)   AS _value_raw
      FROM raw
    GROUP BY user_id