Search code examples
clickhouse

Can I create a Materialized View from another Matrialized View in Clickhouse?


The tile pretty much says it. I want to create a Materialized View whose "SELECT" clause SELECTs data from another Materialized View in Clickhouse. I have tried this. The SQL for "createion" fo the two views runs without an error. But upon runtime, the first view is populated, but the second one isn't.

I need to know if I am making a mistake in my SQL or this is just simply not possible.

Here's my two views:

CREATE MATERIALIZED VIEW IF NOT EXISTS production_gross
            ENGINE = ReplacingMergeTree
                ORDER BY (profile_type, reservoir, case_tag, variable_name, profile_phase, well_name, case_name,
                          timestamp) POPULATE
AS
SELECT profile_type,
       reservoir,
       case_tag,
       is_endorsed,
       toDateTime64(endorsement_date / 1000.0, 0) AS endorsement_date,
       endorsed_for_month,
       variable_name,
       profile_phase,
       well_name,
       case_name,
       asset_id,
       toDateTime64(eoh / 1000, 0)                as end_of_history,
       toDateTime64(ts / 1000, 0)                 as timestamp,
       value,                                     -- AS rate,  -- cubic meters per second rate for this month
       value * dateDiff('second',
                        toStartOfMonth(subtractMonths(now(), 1)),
                        toStartOfMonth(now()))    AS volume -- cubic meters volume for this month

FROM (
         SELECT pp.profile_type                                                                   AS profile_type,
                trimBoth(splitByChar('-', case_name)[1])                                          AS reservoir,
                JSONExtractString(cd.data, 'case_data', 'Tags$$Tag')                              AS case_tag,
                JSONExtractString(cd.data, 'case_data', 'Tags$$Endorsed')                         AS is_endorsed,
                -- Endorsement Data, is the timestamp when the user "endorsed" the case
                JSONExtract(cd.data, 'case_data', 'Tags$$EndorsementDate', 'time_stamp', 'Int64') AS endorsement_date,
                -- Endorsement Month is the month of year for which the case was actually endorsed
                JSONExtractString(cd.data, 'case_data', 'Tags$$MonthTags')                        AS endorsed_for_month,
                pp.variable_name                                                                  AS variable_name,
                JSONExtractString(pp.data, 'profile_phase')                                       AS profile_phase,
                JSONExtractString(wd.data, 'name')                                                AS well_name,
                JSONExtractString(cd.data, 'header', 'name')                                      AS case_name,
                -- We might want to have asset id here to use in roll-up
                JSONExtract(cd.data, 'header', 'reservoir_asset_id', 'Int64')                     AS asset_id, -- Asset Id in ARM
                JSONExtract(pp.data, 'end_of_history', 'Int64')                                   AS end_of_history,
                JSONExtract(pp.data, 'values', 'Array(Float64)')                                  AS values,
                JSONExtract(pp.data, 'timestamps', 'Array(Int64)')                                AS timestamps,
                JSONExtract(pp.data, 'end_of_history', 'Int64')                                   AS eoh
         FROM production_profile AS pp
                  INNER JOIN well_data AS wd ON wd.uuid = pp.well_id
                  INNER JOIN case_data AS cd ON cd.uuid = pp.case_id
         )
    ARRAY JOIN
     values AS value,
     timestamps AS ts
;

CREATE MATERIALIZED VIEW IF NOT EXISTS production_volume_actual
            ENGINE = ReplacingMergeTree
                ORDER BY (asset_id,
                          case_tag,
                          variable_name,
                          endorsement_date) POPULATE
AS
SELECT profile_type,
       case_tag,
       is_endorsed,
       endorsement_date,
       endorsed_for_month,
       variable_name,
       profile_phase,
       asset_id,
       sum(volume) AS total_actual_volume
FROM production_gross
WHERE timestamp < end_of_history
GROUP BY profile_type,
         case_tag,
         is_endorsed,
         endorsement_date,
         endorsed_for_month,
         variable_name,
         profile_phase,
         asset_id
ORDER BY asset_id ASC,
         case_tag ASC,
         variable_name ASC,
         endorsement_date ASC
;

As you can see, the second view is an "aggregation" on the first, and that is why I need it. If I want to do the aggregation from scratch, a lot of processes has to be done twice.

Update: I have tried to change the query to the following:

SELECT ...
FROM `.inner.production_gross`
...

Which did not help. This query resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.production_gross` doesn't exist.

Then, based on the comment by @DennyCrane and using this answer: https://stackoverflow.com/a/67709334/959156, I run this query:

SELECT
    uuid,
    name
FROM system.tables
WHERE database = 'default' AND engine = 'MaterializedView'

Which gave me the uuid of the inner table:

ebab2dc5-2887-4e7d-998d-6acaff122fc7

So, I ran this query:

SELECT ...
FROM `.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7`

Which resulted in the following error:

Code: 60. DB::Exception: Table default.`.inner.ebab2dc5-2887-4e7d-998d-6acaff122fc7` doesn't exist.

Solution

  • Materialized views work as insert triggers on actual data tables, so your production_volume_actual table has to do a SELECT on a data table, not a "view".

    If you CREATE a materialized view using an ENGINE (and not as TO another data table), ClickHouse actually creates a data table with the name .inner.<mv_name> on older versions (not using an Atomic database engine), or .inner_id.<some UUID>. if using an Atomic or Replicated database engine. So if you change the select in your second view to this "inner" table name, either:

    select from `.inner.production_gross`
    
    select from `.inner_id.<UUID>`  -- note the extra '_id' on 'inner'
    

    It should work.

    This answer can point you to the right UUID.

    At ClickHouse we actually recommend you always create Materialized Views as TO <second_table> to avoid this kind of confusion, and to make operations on <second_table> simpler and more transparent.

    (Thanks to OP Mostafa Zeinali and Denny Crane for the clarification for more recent ClickHouse versions)