I have table:
CREATE TABLE address(
city String,
street String,
house String,
model UInt8
) ENGINE=Memory;
For instance, with these data:
Paris, aaa, 12a, 10
Paris, bbb, 13a, 20
Berlin, ccc, 14b, 30
Minsk, ddd, 15b, 40
And I have the following query:
SELECT any(street), any(house)
FROM address WHERE city IN ('Paris', 'Berlin', 'Minsk')
GROUP BY city, street
ORDER BY model;
But this query non-deterministic - any can return first or second row... I need a deterministic query, that returns first row from aggregation. How can I do it with ClickHouse?
The argMin function can be used for this with a tuple for 2-nd parameter.
You may use whatever ordering expression needed instead (like just a single column without round brackets).
SELECT
T.city as city
, argMin (T.street, (T.street, T.house, T.model)) as street
, argMin (T.house, (T.street, T.house, T.model)) as house
, argMin (T.model, (T.street, T.house, T.model)) as model
FROM VALUES
(
'
city String,
street String,
house String,
model UInt8
'
, ('Paris', 'aaa', '12a', 10)
, ('Paris', 'bbb', '13a', 20)
, ('Berlin', 'ccc', '14b', 30)
, ('Minsk', 'ddd', '15b', 40)
) T
WHERE T.city IN ('Paris', 'Berlin', 'Minsk')
GROUP BY T.city
ORDER BY model
city | street | house | model |
---|---|---|---|
Paris | aaa | 12a | 10 |
Berlin | ccc | 14b | 30 |
Minsk | ddd | 15b | 40 |
This works because tuples in ClickHouse are comparable (from left to right) which is suitable for ordering and use here.
SELECT ('Paris', 'aaa', '12a', 10) < ('Paris', 'b', '12a', 0)
less(('Paris', 'aaa', '12a', 10), ('Paris', 'b', '12a', 0)) |
---|
1 |
Update:
As for performance. One may compare 2 different approaches of getting the same on their own data.
-- 7.3 sec
SELECT
T.city as city
, argMin (T.street, (T.street, T.house, T.model)) as street
, argMin (T.house, (T.street, T.house, T.model)) as house
, argMin (T.model, (T.street, T.house, T.model)) as model
FROM
(
SELECT
'City ' || toString (n.number % 10) as city
, 'Street ' || toString (n.number % 1000) as street
, toString (n.number % 100) as house
, n.number % 100 as model
FROM numbers (1e7) n
) T
GROUP BY T.city
vs
-- 13.6 sec
SELECT * EXCEPT rn_
FROM
(
SELECT
T.*
, ROW_NUMBER () OVER (PARTITION BY T.city ORDER BY T.street, T.house, T.model)
as rn_
FROM
(
SELECT
'City ' || toString (n.number % 10) as city
, 'Street ' || toString (n.number % 1000) as street
, toString (n.number % 100) as house
, n.number % 100 as model
FROM numbers (1e7) n
) T
)
WHERE rn_ = 1