Search code examples
group-byclickhouse

ClickHouse: Efficient way to aggregate data by different time ranges at


I need to aggregate time-series data (with average functions) on different timeslots, like:

  • Today

  • last X days

  • Last weekend

  • This week

  • Last X weeks

  • This month

  • List item

    etc...

Q1: Can it be done within GROUP BY statement or at least with a single query?

Q2: Do I need any Materialized View for that?

The table is partitioned by Month and sharded by UserID

All queries are within UserID (single shard)


Solution

  • group by with ROLLUP

    create table xrollup(metric Int64, b date, v Int64 ) engine=MergeTree partition by tuple() order by tuple();
    insert into xrollup values (1,'2018-01-01', 1), (1,'2018-01-02', 1), (1,'2018-02-01', 1), (1,'2017-03-01', 1);
    insert into xrollup values (2,'2018-01-01', 1), (2,'2018-02-02', 1);
    
    SELECT metric, toYear(b) y, toYYYYMM(b) m, SUM(v) AS val
    FROM   xrollup
    GROUP BY metric, y, m  with ROLLUP
    ORDER BY metric, y, m 
    
    ┌─metric─┬────y─┬──────m─┬─val─┐
    │      0 │    0 │      0 │   6 │  overall
    │      1 │    0 │      0 │   4 │  overall by metric1
    │      1 │ 2017 │      0 │   1 │  overall by metric1 for 2017
    │      1 │ 2017 │ 201703 │   1 │  overall by metric1 for march 2017
    │      1 │ 2018 │      0 │   3 │
    │      1 │ 2018 │ 201801 │   2 │
    │      1 │ 2018 │ 201802 │   1 │
    │      2 │    0 │      0 │   2 │
    │      2 │ 2018 │      0 │   2 │
    │      2 │ 2018 │ 201801 │   1 │
    │      2 │ 2018 │ 201802 │   1 │
    └────────┴──────┴────────┴─────┘