Search code examples
sqlaggregate-functionsclickhouse

First element of aggregation in clickhouse


I have table:

CREATE TABLE address(
   city String,
   street String,
   house String,
   model UInt8
) ENGINE=Memory;

For instance, with these data:

Paris, aaa, 12a, 10
Paris, bbb, 13a, 20
Berlin, ccc, 14b, 30
Minsk, ddd, 15b, 40

And I have the following query:

SELECT any(street), any(house)
FROM address WHERE city IN ('Paris', 'Berlin', 'Minsk') 
GROUP BY city, street 
ORDER BY model;

But this query non-deterministic - any can return first or second row... I need a deterministic query, that returns first row from aggregation. How can I do it with ClickHouse?


Solution

  • The argMin function can be used for this with a tuple for 2-nd parameter.
    You may use whatever ordering expression needed instead (like just a single column without round brackets).

    SELECT 
      T.city as city
    , argMin (T.street, (T.street, T.house, T.model)) as street
    , argMin (T.house,  (T.street, T.house, T.model)) as house
    , argMin (T.model,  (T.street, T.house, T.model)) as model
    FROM VALUES 
    (
    '
       city String,
       street String,
       house String,
       model UInt8
    '
    , ('Paris',  'aaa', '12a', 10)
    , ('Paris',  'bbb', '13a', 20)
    , ('Berlin', 'ccc', '14b', 30)
    , ('Minsk',  'ddd', '15b', 40)
    ) T
    WHERE T.city IN ('Paris', 'Berlin', 'Minsk')
    GROUP BY T.city
    ORDER BY model
    
    city street house model
    Paris aaa 12a 10
    Berlin ccc 14b 30
    Minsk ddd 15b 40

    This works because tuples in ClickHouse are comparable (from left to right) which is suitable for ordering and use here.

    SELECT ('Paris',  'aaa', '12a', 10) < ('Paris',  'b', '12a', 0)
    
    less(('Paris', 'aaa', '12a', 10), ('Paris', 'b', '12a', 0))
    1

    Update:
    As for performance. One may compare 2 different approaches of getting the same on their own data.

    -- 7.3 sec
    SELECT 
      T.city                                            as city
    , argMin (T.street, (T.street, T.house, T.model))   as street
    , argMin (T.house,  (T.street, T.house, T.model))   as house
    , argMin (T.model,  (T.street, T.house, T.model))   as model
    FROM
    (
      SELECT
        'City '   || toString (n.number % 10)   as city
      , 'Street ' || toString (n.number % 1000) as street
      , toString (n.number % 100)               as house
      , n.number % 100                          as model
      FROM numbers (1e7) n
    ) T
    GROUP BY T.city
    

    vs

    -- 13.6 sec
    SELECT * EXCEPT rn_
    FROM 
    (
      SELECT 
        T.*
      , ROW_NUMBER () OVER (PARTITION BY T.city ORDER BY T.street, T.house, T.model)
            as rn_
      FROM
      (
      SELECT
        'City '   || toString (n.number % 10)   as city
      , 'Street ' || toString (n.number % 1000) as street
      , toString (n.number % 100)               as house
      , n.number % 100                          as model
      FROM numbers (1e7) n
      ) T
    )
    WHERE rn_ = 1