Search code examples
sqloraclegreatest-n-per-group

ORACLE - Limit number of rows for specific column value


I have a table like that in my Oracle database. ( version 21C )

UserId          Date           Objective  Value
------------------------------------------------
1        2022-05-19 14:56:41     Test y     22
2        2022-02-19 14:56:41     Test y     44 
1        2022-06-19 14:56:41     Test y     58 
5        2022-07-19 14:56:41     Test y     98 
4        2022-08-19 14:56:41     Test y     25 
3        2022-09-19 14:56:41     Test y     54 
1        2022-10-19 14:56:41     Test y     41 
2        2022-06-09 14:56:41     Test y     52 
2        2022-05-05 14:56:41     Test y     44 
1        2022-04-22 14:56:41     Test y     48 
2        2022-07-19 14:56:41     Test y     88 
1        2022-01-10 14:56:41     Test y     4 
2        2022-03-19 14:56:41     Test y     2 
3        2022-02-19 14:56:41     Test y     44 

What I would like to do is to return rows but with a maximum of 3 rows with the same User Id. If a UserId is present more than 3 times in the table, I need to return the 3 more recent lines ( ordering by date desc ).

For example with the data that I gave I should return a total of 10 lines ( 3 lines for UserId 1 (the 3 more recent), 3 lines for UserId 2 (the 3 more recent), 2 lines for UserId 3 (because only 2 records) , 1 line for UserId 4 (because only 1 record) and 1 line for UserId 5 (because only 1 record) ).

I don't know how to do that in one query if it's possible.

Thank you in advance !


Solution

  • The ROW_NUMBER() function will work best for this. It is similar to RANK/DENSE_RANK except that it will always (arbitrarily) assign a different value to each row even in the event of collisions (same value in the ORDER BY column(s)), so you can easily enforce the "give me at most 3 rows" rule:

    SELECT *
      FROM (SELECT x.*,
                   ROW_NUMBER() OVER (PARTITION BY userid ORDER BY datecol DESC) seq
              FROM tablename x)
     WHERE seq <= 3
    

    The PARTITION BY clause defines the buckets within which the counting is done. For every distinct value of the column(s) specified by PARTITION BY, there is a separate counting (and ordering, if specified) performed. That's how you achieve the "X rows per userid" effect.