Search code examples
greatest-n-per-groupclickhouse

Top N rows by group in ClickHouse


What is the proper way to query top N rows by group in ClickHouse?
Lets take an example of tbl having id2, id4, v3 columns and N=2. I tried the following

SELECT                                                                          
    id2,                                                                        
    id4,                 
    v3 AS v3        
FROM tbl
GROUP BY                 
    id2,                 
    id4                  
ORDER BY v3 DESC                                                                
LIMIT 2 BY                       
    id2,                 
    id4      

but getting error

Received exception from server (version 19.3.4):
Code: 215. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception
: Column v3 is not under aggregate function and not in GROUP BY..

I could put v3 into GROUP BY and it does seems to work, but it is not efficient to group by a metric.

There is any aggregate function, but we actually want all values (limited to 2 by LIMIT BY clause) not any value, so it doesn't sound like to be proper solution here.

SELECT                                                                          
    id2,                                                                        
    id4,                 
    any(v3) AS v3        
FROM tbl
GROUP BY                 
    id2,                 
    id4                  
ORDER BY v3 DESC                                                                
LIMIT 2 BY                       
    id2,                 
    id4      

Solution

  • It can be used aggregate functions like this:

    SELECT
        id2,
        id4,
        arrayJoin(arraySlice(arrayReverseSort(groupArray(v3)), 1, 2)) v3
    FROM tbl
    GROUP BY
        id2,
        id4