Search code examples
aggregate-functionsclickhouse

How can I produce an aggregated result with subgroups in Clickhouse?


I want to be able to effectively produce an aggregated result from clickhouse table along with sort of an array with subgroups

So, let me give an example

Say, I have the following exemplary table

Column Type
id String
client String
v1 Int
v2 Int
when DateTime

Starting with simple aggregation query, like

SELECT id, AVG(v1) AVG1, SUM(v2) SUM2 FROM table WHERE When > today() GROUP BY id that will produce something like

ID AVG1 SUM2
1 100 300
2 200 400
... ... ...

I want extend the result with something like this

ID AVG SUM2 Rows per Client
1 100 300 [{AVG1:110, SUM2:150},{AVG1:90, SUM2:50},{AVG1:100, SUM2:100}]
2 200 400 [{200, 100},{200, 300},{200, 0}]
... ... ... [...]

The rows per client field is aggregated with the same filters as the main query, but also applies extra group by to its results

I'm curios, if something like this is even possible in Clickhouse (and, if so, what's the most efficient way to do so), or do I have to use Join and then programmatically parse the results?

Joins are the best I've managed to accomplish so far, but the resulting query wasn't optimal, as I had to select the same data twice (note, that the table and queries I used are indeed exemplary, and the real ones has far more fields and more sophisticated aggregation), and the results are not quite the same to what I'm trying to accomplish


Solution

  • create table I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY(
      id    String,
      client    String,
      v1    Int,
      v2    Int,
    When    DateTime) Engine=Memory;
    
    insert into I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
    select number, arrayJoin(['client1', 'client2', 'client3']),
         number%10, number%3, today()
    from numbers(15);
    
    
    SELECT
        id,
        avgMerge(AVG1s) AS AVG1,
        sum(SUM2s) AS SUM2,
        CAST(groupArray((client, (finalizeAggregation(AVG1s), SUM2s))), 'Map(String, Tuple(avg Float64, sum Int64))') AS r
    FROM
    (
        SELECT
            id,
            client,
            avgState(v1) AS AVG1s,
            SUM(v2) AS SUM2s
        FROM I_AM_TIRED_TO_WRITE_EXAMPLES_WHY_ARE_YOU_SO_LAZY
        WHERE When >= today()
        GROUP BY
            id,
            client
    )
    GROUP BY id
    ORDER BY id ASC
    
    ┌─id─┬─AVG1─┬─SUM2─┬─r─────────────────────────────────────────────────┐
    │ 0  │    0 │    0 │ {'client2':(0,0),'client3':(0,0),'client1':(0,0)} │
    │ 1  │    1 │    3 │ {'client1':(1,1),'client2':(1,1),'client3':(1,1)} │
    │ 10 │    0 │    3 │ {'client3':(0,1),'client2':(0,1),'client1':(0,1)} │
    │ 11 │    1 │    6 │ {'client3':(1,2),'client2':(1,2),'client1':(1,2)} │
    │ 12 │    2 │    0 │ {'client2':(2,0),'client3':(2,0),'client1':(2,0)} │
    │ 13 │    3 │    3 │ {'client2':(3,1),'client3':(3,1),'client1':(3,1)} │
    │ 14 │    4 │    6 │ {'client3':(4,2),'client2':(4,2),'client1':(4,2)} │
    │ 2  │    2 │    6 │ {'client1':(2,2),'client3':(2,2),'client2':(2,2)} │
    │ 3  │    3 │    0 │ {'client3':(3,0),'client2':(3,0),'client1':(3,0)} │
    │ 4  │    4 │    3 │ {'client1':(4,1),'client3':(4,1),'client2':(4,1)} │
    │ 5  │    5 │    6 │ {'client3':(5,2),'client2':(5,2),'client1':(5,2)} │
    │ 6  │    6 │    0 │ {'client2':(6,0),'client3':(6,0),'client1':(6,0)} │
    │ 7  │    7 │    3 │ {'client1':(7,1),'client2':(7,1),'client3':(7,1)} │
    │ 8  │    8 │    6 │ {'client3':(8,2),'client2':(8,2),'client1':(8,2)} │
    │ 9  │    9 │    0 │ {'client1':(9,0),'client3':(9,0),'client2':(9,0)} │
    └────┴──────┴──────┴───────────────────────────────────────────────────┘
    

    groupArrayMap: https://fiddle.clickhouse.com/383cb12c-0039-401c-a514-1eb333ae5720